Friday, September 26, 2014

Apache Argus

Security on Hadoop has been catch-as-catch-can for much of the product's lifetime, but a new Apache project that entered incubation earlier this year -- Apache Argus -- addresses it in a consistent manner.

With the delivery of YARN, which powers Hadoop’s ability to run multiple workloads operating on shared data sets within a single cluster, a heightened requirement for a centralized approach to security policy definition and coordinated enforcement has surfaced.

Argus will deliver this comprehensive approach to central security policy administration across the core enterprise security requirements of authentication, authorization, accounting and data protection. It already extends baseline features for coordinated enforcement across Hadoop workloads from batch, interactive SQL and real–time IN Hadoop. And we will leverage the extensible architecture of this security platform to apply policies consistently against additional Hadoop ecosystem components (beyond HDFS, Hive, and HBase) including Storm, Solr, Spark, and more. It truly represents a major step forward for the Hadoop ecosystem by providing a comprehensive approach – all completely as open source.

Argus did not start as a community initiative; it's the open-sourced version of a commercial product, XA Secure, that Hortonworks acquired and transformed into an Apache-hosted project. The idea, as Hortonworks explained earlier this year, is to provide a centralized way to define and enforce security policy across Hadoop and all its components. This includes access controls down to the folder and file level in HDFS, and to the table and column level in Hive and HBase. But don't expect automatic Argus integration -- this project has a long road ahead for Hortonworks and everyone else contributing to the Hadoop ecosystem.

In May, Hortonworks acquired XA Secure and made a promise to contribute this technology to the Apache Software Foundation.  In June, we made it available for all to download and use from our website and today we are proud to announce this technology officially lives on as Apache Argus, an incubator project within the ASF.

Monday, September 22, 2014

Cassandra Summit 2014

Cassandra Summit 2014 was the single largest gathering of Cassandra users on the planet.
It was successfully completed on September 10-12, 2014 at SanFrancisco California, USA.

In this conference, you could learn how the world’s most successful companies are transforming their businesses using Apache Cassandra™, the world's fastest and most scalable Distributed Database Management System.From best practices and how-tos, to expert panels and case studies, the participants had the amazing opportunities to learn how to conduct your business in a whole new way.

As part of this summit, Big Data professionals were benefited with:

  • 60+ sessions featuring ground breaking use cases from some of the world’s hottest companies, such as Google, FedEx, Sony, Netflix, Safeway, Neiman Marcus, and eBay.
  • Expert tips to succeed in Today's Digital economy
  • Specialized training to grow your career
  • Networking with the worlds's Apache Cassandra experts

Summit materials are available at, upto 2013 sessions.

Monday, September 8, 2014

DataStax: 106 million

DataStax, the company that delivers Apache Cassandra™ to the enterprise, announced that it has secured $106 million in Series E financing. This amount, together with the $84 million investment in previous rounds, brings the total invested to date to $190 million.

DataStax raised $106 Million, which is the proof of investment solidifies growing global enterprise demands for Distributed Database Management Systems.

In just under a year, DataStax has experienced extremely rapid growth and now has:

  • More than 350 employees (100 percent increase since December 2013) in six global locations: Santa Clara, Austin, London, Paris, Sydney and Tokyo.
  • Customers which include 25 percent of the Fortune 100 enterprises.
  • A rapidly expanding global sales force that has grown the customer base in over 50 countries and accounts for revenue growth of more than 125 percent year over year.
  • A powerful and experienced executive team that is focused on scaling the company internationally. Key hires in the past year include known industry veterans Dennis Wolf as CFO, John Schweitzer as SVP of Field Operations, Tony Kavanagh as CMO and Clint Smith as General Counsel.

With this new round of funding, DataStax expects to accelerate growth and deliver sustainable long-­term value and success to customers and partners.

Saturday, September 6, 2014

Facebook Flux

In the similar line of Google's Clould Data Flow, Facebook have already developed a data flow architecture called Flux. Flux works within the Facebook messaging system. It avoids cascading affects by preventing nested updates- simply put, Flux has a single directional data flow, meaning additional actions aren’t triggered until the data layer has completely finished processing.

Flux is the application architecture that Facebook uses for building client-side web applications. It complements React's composable view components by utilizing a unidirectional data flow. It's more of a pattern rather than a formal framework, and you can start using Flux immediately without a lot of new code.

FlumeJava, from which Cloud Dataflow evolved, is also involved the process of creating easy-to-use, efficient parallel pipelines. At Flume’s core are “a couple of classes that represent immutable parallel collections, each supporting a modest number of operations for processing them in parallel. Parallel collections and their operations present a simple, high-level, uniform abstraction over different data representations and execution strategies.”