Saturday, March 22, 2014

Apache Shark

Shark is an open source distributed SQL query engine for Hadoop data. It brings state-of-the-art performance and advanced analytics to Hive users.

Shark uses the powerful Apache Spark engine to speed up computations. It run Hive queries up to 100x faster in memory, or 10x on disk.

Shark reuses the Hive frontend and metastore, giving you full compatibility with existing Hive data, queries, and UDFs. You simply installs it alongside Hive. By running on Spark, Shark can call complex analytics functions like machine learning right from SQL

Unlike other interactive SQL engines, Shark supports mid-query fault tolerance, letting it scale to large jobs. In terms of scalability, Shark uses the same engine for both short and long queries.


No comments:

Post a Comment