Sunday, November 26, 2017

Spark for Azure HDInsight


As indicated in early Jul'17, Microsoft now officially leveraged Apache Spark in Azure HDInsight.  Early Ref: http://www.zdnet.com/article/spark-comes-to-azure-hdinsight/

Last week, Microsoft announced Azure Databricks service, new Cosmos DB features, enterprise AI capabilities and more at its annual Connect(); event in New York

Microsoft is getting the Apache Spark religion, introducing a new cloud service in preview, called Azure Databricks. This is noteworthy for a number of reasons.

First, the service was developed jointly by Microsoft and Databricks (the company whose founders are Spark's very creators), to deliver this Spark-based Big Data analytics service as a first-party Azure offering, and not a mere partner service on the Azure Marketplace.

Second, the service works independently of Databricks' own cloud service for Spark and of Azure HDInsight, Microsoft's own Big Data as a Service platform, on which Spark also runs.

Azure Databricks has nonetheless been designed form the ground up to take advantage of, and be fully optimized for, various Azure services, including blob storage, Data Lake Store, virtual networking, Azure Active Directory and Azure Container Service.

While Azure Databricks, like HDInsight, is still based on the creation a dedicated cluster, with the number and type of nodes (servers) being determined by the customer, it nonetheless has built-in auto-scaling and auto-termination, to grow the cluster as necessary and shut it down once it's no longer needed.

6 comments: