Ganesan Senthilvel: March 2014

Saturday, March 22, 2014

Apache Shark

Shark is an open source distributed SQL query engine for Hadoop data. It brings state-of-the-art performance and advanced analytics to Hive users.

Shark uses the powerful Apache Spark engine to speed up computations. It run Hive queries up to 100x faster in memory, or 10x on disk.

Shark reuses the Hive frontend and metastore, giving you full compatibility with existing Hive data, queries, and UDFs. You simply installs it alongside Hive. By running on Spark, Shark can call complex analytics functions like machine learning right from SQL

Unlike other interactive SQL engines, Shark supports mid-query fault tolerance, letting it scale to large jobs. In terms of scalability, Shark uses the same engine for both short and long queries.

Saturday, March 8, 2014

Apache Spark

Apache Spark, an in-memory data-processing framework, an important step for Spark’s stability as it increasingly replaces MapReduce in next-generation big data applications. Spark has an advanced DAG execution engine that supports cyclic data flow and in-memory computing.

It is an open-source data analytics cluster computing framework originally developed in the AMPLab at UC Berkeley. Spark fits into the Hadoop open-source community, building on top of the Hadoop Distributed File System (HDFS). However, Spark is not tied to the two-stage MapReduce paradigm, and promises performance up to 100 times faster than Hadoop MapReduce, for certain applications. Spark provides primitives for in-memory cluster computing that allows user programs to load data into a cluster's memory and query it repeatedly, making it well suited to machine learning algorithms. Spark can run on Hadoop 2's YARN cluster manager, and can read any existing Hadoop data.

Apache Spark™ is a fast and general engine for large-scale data processing.The Apache Software Foundation announced recently that Spark has graduated from the Apache Incubator to become a top-level Apache project, signifying that the project’s community and products have been well-governed under the ASF’s meritocratic process and principles. This is a major step for the community and we are very proud to share this news with users as we complete Spark’s move to Apache.

Saturday, March 1, 2014

Qlik View

Recent CITO Research report explains exactly what big data is, why it matters to you, and how to put it to work for your business. You’ll also see how big data is being used to win elections, reduce crime, and literally change the world. Make sure your business isn’t left behind.

With QlikView, we can run reports and create dashboards quickly to detect market changes and product sales in real time. This allows our salespeople to immediately respond to new opportunities and improve business performance.

Big Data’s value can be unleashed for business users by condensing it and intelligently presenting only what is relevant and contextual to the problem at hand. Whether it's an executive wanting summary data across the company’s product lines or a manager wanting more detail, but only for the areas that he or she oversees. IT professionals are challenged with not only providing the infrastructure but also to help provide meaning to the Big Data.