Ganesan Senthilvel: Databricks Cloud

Saturday, March 7, 2015

Databricks Cloud

Databricks (founded 2013 by Spark creators; recent VC $47million) aims to become the platform for Big Data by leveraging Apache Spark. The idea is to simplify big data processing and free users to focus on turning data into value

Databricks provides a cloud hosted platform based on Spark which allows companies to implement their entire big data pipeline — from data ingestion, data transformation, interactive processing, to data products — in one environment. It offers interactive visualization, using machine learning, graph processing, and building and running data products.

Spark is a parallel execution engine that is better than Hadoop MapReduce in three dimensions:

optimized to work efficiently with data stored both in memory and on disk.
provides a more powerful and flexible API than MapReduce, which makes it much easier for developers to write sophisticated applications.
unifies a variety of computation models including streaming, interactive queries, machine learning, and graph processing.

Databricks Cloud is a Big Data cloud computing solution with three main parts:

Databricks Platform
Spark
Databricks workspace.

Although Databricks Cloud would initially be designed to run on Amazon Web Services, the company said they would eventually make it compatible with the Google Compute Engine and Microsoft Azure.