Sunday, July 12, 2015

HDFS Design


In principle, HDFS has a block size higher than most other file systems. The default is 128M and some go as high as 1G. Files in HDFS are write once.

There are three daemons that make up a standard HDFS cluster.
  1. NameNode - 1 per cluster. Meta data's centralized server to provide a global picture of the filesystem's state.
  2. Secondary NameNode - 1 per cluster.  Performs internal NameNode transaction log check pointing.
  3. DataNode - Many per cluster.  Stores block data (contents of files).

NameNode stores its filesystem metadata on local filesystem disks in a few different files, but the two most important of which are fsimage and edits.  Fsimage contains a complete snapshot of the filesystem metadata including a serialized form of all the directory and file inodes in the filesystem.  Edits file (journal) contains only incremental modifications made to the metadata, which acts as write ahead log.

Secondary NameNode is not only backup of NameNode but also shares the workload via checkpointing process.  In which secondary NameNode applies the updates from the edits file to the fsimage file and sends it back to the primary.  Checkpointing is controlled by duration (default 60 mins) and/or file size and/or transaction count of Edits file.

Daemon responsible for storing and retrieving block (chunks of a file) data is called the DataNode .  Datanodes regularly report their status to the NameNode in a heartbeat mode; default 3 mins.  It sends Block Report (list of all usable blocks of DataNode disks) to NameNode; default 60 mins.

3 key daemons of HDFS Architecture, is represented in the attached diagram.

No comments:

Post a Comment