Thursday, July 24, 2014

Knox Gateway

This week, Apache Knox community announced the release of the Apache Knox Gateway (Incubator) 0.3.0.

The Apache Knox Gateway is a REST API Gateway for Hadoop with a focus on enterprise security integration.  It provides a simple and extensible model for securing access to Hadoop core and ecosystem REST APIs.

Apache Knox provides pluggable authentication to LDAP and trusted identity providers as well as service level authorization and more.  The attached diagram below shows how Apache Knox fits in a Hadoop cluster deployment.

Highlight of the recent release:

  • LDAP zfor REST calls to Hadoop
  • Secure Hadoop cluster (i.e. Kerberos) integration
  • HBase integration integration (non-Kerberos)
  • Simple ACL based Service(non-Kerberos)
  • Hive JDBC  Level Authorization

Sunday, July 20, 2014

Hortonworks Security

Mid May'14, Hortonworks, the leading provider of enterprise Apache Hadoop, cquired XA Secure, a leading data security company, to accelerate its delivery of a holistic and centralized approach to Hadoop security.

As the result of this acquisition, Hortonworks publishes the security roadmap with earlier, current and future state.  The relevant security capabilites, are listed as below:

Earlier State
  • Kerberos Authentication
  • HBase, Hive & HDFS authorization
  • Wire Encryption for HDFS, Shuffle & JDBC
  • Basic audit in HDFS & MR
  • ACLs for HDFS
  • Knox: Hadoop REST API Security
  • SQL-style Hive Authorization
  • Expanded Wire Encryption for HiveServer2 & WebHDFS

Current State
  • Centralized Security Administration for HDFS, HBase & Hive
  • Centralized Audit Reporting
  • Delegated Policy Administration

Future State
  • Encryption in HDFS, Hive & HBase
  • Centralized security administration for all Hadoop components
  • Expand audit to cover more operations and provide audit correlation
  • Offer additional SSO integration choices
  • Tag-based global policies

Thursday, July 10, 2014

DataStax Spark

Apache Spark is a project designed to accelerate Hadoop and other big data applications through the use of an in-memory, clustered data engine.  It is paradigm shift from disk based map reduce process.

Spark is a fast and powerful engine for processing Hadoop data. It runs in Hadoop clusters through Hadoop YARN or Spark's standalone mode, and it can process data in HDFS, HBase, Cassandra, Hive, and any Hadoop InputFormat. It is designed to perform both general data processing (similar to MapReduce) and new workloads like streaming, interactive queries, and machine learning.

Speed is key!  Leveraging an efficient in-memory storage format, an optimistic execution engine and a cache-conscious memory layout, Apache Spark performs 100x faster than Hadoop.  Herez the performance metric, based on word count program running on both platforms,

DataStax Apache Spark support means certified Spark software now ships with DSE 4.5, and it is supported by DataStax. DSE 4.5 (release on 3rd July) provides high-availability features for Spark that ensure resilience and fail-over.

Saturday, July 5, 2014


Dear Friends/Followers,

Social media is the social interaction among people in which they create, share or exchange information and ideas in virtual communities and networks.

Travel not for the destination, but for the joy of the journey.

In my last 3 years Social Media (CodeProject, Blogger) journey, reached the benchmark of 50K+ points & 45K hits with all your support and guidance.


Appreciate your time and energy to feed the consistency within me. Will continue to (l)earn the industry trend/reputation.