Saturday, October 25, 2014

Apache Tez

The Apache Tez project is aimed at building an application framework which allows for a complex directed-acyclic-graph of tasks for processing data. It is currently built atop Apache Hadoop YARN

The 2 main design themes for Tez are:

1. Empowering end users by:

  • Expressive dataflow definition APIs
  • Flexible Input-Processor-Output runtime model
  • Data type agnostic
  • Simplifying deployment

2. Execution Performance

  • Performance gains over Map Reduce
  • Optimal resource management
  • Plan reconfiguration at runtime
  • Dynamic physical data flow decisions

By allowing projects like Apache Hive to run a complex DAG of tasks, Tez can be used to process data, that earlier took multiple MR jobs, now in a single Tez job as shown in the attached image.

No comments:

Post a Comment