Sunday, July 19, 2015

Name Node


NameNode acts as the data manager between the requested client and content holder - DataNodes.

On start up, DataNodes send the block report to NameNode on every hour, along with the heartbeats on every 3 seconds.  NameNode keeps track of every data change.  At any given time, the NameNode has a complete view of all DataNodes in the cluster, their current health, and what blocks they have available.  File to block mapping on the NameNode is stored on disk.

NameNode does not directly send requests to DataNodes. It uses replies to heartbeats to send instructions to the DataNodes.  The instructions include commands to replicate blocks to other nodes, remove local block replicas, re-register and send an immediate block report, and shut down the node.  NameNode stores its filesystem metadata on local filesystem disks. 2 Key files are (1)FsiImage (2)EditsLog

FsiImage contains the complete snapshot of filesystem at iNode level.  iNode is an internal representation of a file or directory's metadata and contains such information as the file's replication level, modification and access times, access permissions, block size, and the blocks a file is made up of.  This design makes not to worry about the changing DataNodes' hostname or IP address.

Edits file (journal) contains only incremental modifications made to the metadata. It uses a write ahead log which reduces I/O operations to sequential, append-only operations, which avoids costly seek operations and yields better overall performance.

On NameNode startup, the fsimage file is loaded into RAM and any changes in the edits file are replayed, bringing the in-memory view of the filesystem up to date.  NameNode filesystem metadata is served entirely from RAM. This makes it fast, but limits the amount of metadata a box can handle. Roughly 1 million blocks occupies roughly 1 GB of heap.

Thus, NameNode performs the filesystem operations in the highly distributed methodology for the Client request(s).

No comments:

Post a Comment