Dear readers, I recently got few requests to share few 'Useful Tips' on Big Data ecosystem.
Tip is a small piece or part fitted to the end of an object. Let me fill Big Data Tips during this quarter - Q3 2015.
For those interested in the history, the super base class of Big Data ecosystem is Google's whitepaper.
The first, presented in 2003, describes a pragmatic, scalable, distributed file system optimized for storing enormous datasets, called "Google File system", or GFS by Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. In addition to simple storage, GFS was built to support large-scale, data-intensive, distributed processing applications.
The following year-2004, another paper, titled "Map-Reduce: Simplified Data Processing on Large Clusters" was presented by Jeffrey Dean and Sanjay Ghemawat, defining a programming model and accompanying framework that provided automatic parallelization, fault tolerance, and the scale to process hundreds of terabytes of data in a single job over thousands of machines.
When paired, these two systems could be used to build large data processing clusters on relatively inexpensive, commodity machines. Google White papers are industry break through and directly inspired the development of HDFS and Hadoop MapReduce, respectively.
Google's WhitePaper References are available at:
Stay tuned for continuous tip & trick to ()l)earn more.