The Apache Tez project is aimed at building an application framework which allows for a complex directed-acyclic-graph of tasks for processing data. It is currently built atop Apache Hadoop YARN
The 2 main design themes for Tez are:
1. Empowering end users by:
- Expressive dataflow definition APIs
 - Flexible Input-Processor-Output runtime model
 - Data type agnostic
 - Simplifying deployment
 
2. Execution Performance
- Performance gains over Map Reduce
 - Optimal resource management
 - Plan reconfiguration at runtime
 - Dynamic physical data flow decisions
 
By allowing projects like Apache Hive to run a complex DAG of tasks, Tez can be used to process data, that earlier took multiple MR jobs, now in a single Tez job as shown in the attached image.

No comments:
Post a Comment