Ganesan Senthilvel: November 2017

Sunday, November 26, 2017

Spark for Azure HDInsight

As indicated in early Jul'17, Microsoft now officially leveraged Apache Spark in Azure HDInsight. Early Ref: http://www.zdnet.com/article/spark-comes-to-azure-hdinsight/

Last week, Microsoft announced Azure Databricks service, new Cosmos DB features, enterprise AI capabilities and more at its annual Connect(); event in New York

Microsoft is getting the Apache Spark religion, introducing a new cloud service in preview, called Azure Databricks. This is noteworthy for a number of reasons.

First, the service was developed jointly by Microsoft and Databricks (the company whose founders are Spark's very creators), to deliver this Spark-based Big Data analytics service as a first-party Azure offering, and not a mere partner service on the Azure Marketplace.

Second, the service works independently of Databricks' own cloud service for Spark and of Azure HDInsight, Microsoft's own Big Data as a Service platform, on which Spark also runs.

Azure Databricks has nonetheless been designed form the ground up to take advantage of, and be fully optimized for, various Azure services, including blob storage, Data Lake Store, virtual networking, Azure Active Directory and Azure Container Service.

While Azure Databricks, like HDInsight, is still based on the creation a dedicated cluster, with the number and type of nodes (servers) being determined by the customer, it nonetheless has built-in auto-scaling and auto-termination, to grow the cluster as necessary and shut it down once it's no longer needed.

Sunday, November 19, 2017

ElasticSearch 6

Mid of last week, ElasticSearch 6 GA (General Availability) was released with tech upgrades like

migration assistant
resiliency
efficiency
scalability
security
index sorting.

Ref: https://www.elastic.co/blog/elasticsearch-6-0-0-released

Saturday, November 11, 2017

Kinesis Analytics

Kinesis Analytics now gives you the option to preprocess your data with AWS Lambda. This gives you a great deal of flexibility in defining what data gets analyzed by your Kinesis Analytics application. You can also define how that data is structured before it is queried by your SQL.

It continuously reads data from your Kinesis stream or Kinesis Firehose delivery stream. For each batch of records that it retrieves, the Lambda processor subsystem manages how each batch gets passed to your Lambda function. Your function receives a list of records as input. Within your function, you iterate through the list and apply your business logic to accomplish your preprocessing requirements (such as data transformation)

The input model to your preprocessing function varies slightly, depending on whether the data was received from a stream or delivery stream