Saturday, October 25, 2014

Apache Tez


The Apache Tez project is aimed at building an application framework which allows for a complex directed-acyclic-graph of tasks for processing data. It is currently built atop Apache Hadoop YARN

The 2 main design themes for Tez are:

1. Empowering end users by:

  • Expressive dataflow definition APIs
  • Flexible Input-Processor-Output runtime model
  • Data type agnostic
  • Simplifying deployment


2. Execution Performance

  • Performance gains over Map Reduce
  • Optimal resource management
  • Plan reconfiguration at runtime
  • Dynamic physical data flow decisions


By allowing projects like Apache Hive to run a complex DAG of tasks, Tez can be used to process data, that earlier took multiple MR jobs, now in a single Tez job as shown in the attached image.

Friday, October 17, 2014

Cloudera Enterprise 5.2


On this Tuesday, Cloudera launched its latest version of its big data enterprise software, Cloudera Enterprise 5.2, with a bevy of features aimed at improving analytics and integration.

With Cloudera 5.2 the focus is on building products to deliver on the promise of the enterprise data hub. In particular, new capabilities make the technology more accessible to users who are not data scientists and also increase the level of security, two hurdles which can stand in the way of Hadoop adoption.

The software company provides an enterprise version of Apache Hadoop, which is widely used for big data analytics process. Cloudera improved security, cloud management and its analytics database known as Impala 2.0.

According to Cloudera, the latest release of its flagship software better integrates with databases, data warehouses and common enterprise applications.

The big picture for Cloudera is to integrate well into enterprises building an analytics fabric designed to crunch data. Cloudera Enterprise 5.2 is compliant with PCI security certifications to crunch sensitive data.