Whenever a user tries to stores a file in HDFS, the file is first break down  into data blocks, and three replicas of these data blocks are stored in slave nodes (data nodes) throughout the Hadoop cluster. That’s a lot of data blocks to keep track of. The NameNode acts as the address book for HDFS because it knows not only which blocks make up individual files but  apart from that it also knows where each of these blocks and their replicas are stored. As you might expect, knowing  exactly where the bodies are buried makes the NameNode a critically key component in a Hadoop cluster. If the NameNode is not available, applications cannot be able to access any data stored in HDFS.

If we take a look at the adjoining figure, we can observe that the NameNode daemon executes on a master node server. All mapping information related to the data blocks and their corresponding files is stored in a file named fsimage. HDFS is a journaling file system, which means that whenever data changes are logged in an edit journal,it tracks events from the last checkpoint — it is the last time when the edit log was merged with fsimage. In HDFS, the edit journal is maintained in a file named edits that’s stored on the NameNode.

Name Node Start-up:

To understand how the NameNode do its work, it is very wise to carefully look at how it starts up. Since the purpose of the NameNode is to communicate applications of how many data blocks they require to process and also to keep track of the correct location where they’re stored, for that all the data block locations and block-to-file mappings that are available in RAM is needed.

These are few steps the NameNode takes to load all the information that it
needs immediately after it starts up, the following happens:

1.    The NameNode loads the fsimage file into memory.

2.  The NameNode loads the edits file and re-plays the journaled changes to update the block metadata that’s already in memory.

3.    The DataNode daemons send the block reports to NameNode.

For every slave node (data node), there’s a block report that enlists all of the data blocks stored in it and also describes the health of each one.

After the completion of the startup process, the NameNode has a complete view of all the data stored in HDFS, and it’s now perfectly ready to receive application requests from Hadoop clients. As data files are added or removed on the basis of client requests, the changes are   instantly written to the slave node’s disk volumes, journal updates are written to the edits file, and the  final changes are reflected in the block locations and metadata stored in the NameNode’s memory

In the entire life-cycle of the cluster, the DataNode daemons report the NameNode heartbeats (a quick signal) every three seconds, telling they’re active. (This default value is configurable.) Every six hours (again, a configurable default), the DataNodes reports the NameNode a block report outlining which file blocks are on their nodes. In this manner, the NameNode always get updated with a current view of the available resources in the cluster.

  Modified On Dec-16-2017 12:53:09 AM

Leave Comment