Whenever
a user tries to stores a file in HDFS, the file is first break down into data blocks, and three replicas of these
data blocks are stored in slave nodes (data nodes) throughout the Hadoop
cluster. That’s a lot of data blocks to keep track of. The NameNode acts as the address book for HDFS
because it knows not only which blocks make up individual files but apart from that it also knows where each of
these blocks and their replicas are stored. As you might expect, knowing exactly where the bodies are buried makes the
NameNode a critically key component in a Hadoop cluster. If the NameNode is not
available, applications cannot be able to access any data stored in HDFS.

If we take a look at the adjoining figure, we can observe that the NameNode daemon executes on a master node server. All mapping information related to the data blocks and their corresponding files is stored in a file named fsimage. HDFS is a journaling file system, which means that whenever data changes are logged in an edit journal,it tracks events from the last checkpoint — it is the last time when the edit log was merged with fsimage. In HDFS, the edit journal is maintained in a file named edits that’s stored on the NameNode.
Name Node Start-up:
To understand how the NameNode do its work, it is very wise to carefully look at how it starts up. Since the purpose of the NameNode is to communicate applications of how many data blocks they require to process and also to keep track of the correct location where they’re stored, for that all the data block locations and block-to-file mappings that are available in RAM is needed.
These are few steps the NameNode takes to load all the information that it
needs immediately after it starts up, the following happens:
1.
The NameNode loads the
fsimage file into memory.
2. The NameNode loads the edits
file and re-plays the journaled changes to update the block metadata that’s
already in memory.
3.
The DataNode daemons send
the block reports to NameNode.
For every slave node (data node), there’s a block report that enlists all of the data blocks stored in it and also describes the health of each one.
After the completion of the startup process, the NameNode has a complete view of all the data stored in HDFS, and it’s now perfectly ready to receive application requests from Hadoop clients. As data files are added or removed on the basis of client requests, the changes are instantly written to the slave node’s disk volumes, journal updates are written to the edits file, and the final changes are reflected in the block locations and metadata stored in the NameNode’s memory
In
the entire life-cycle of the cluster, the DataNode daemons report the NameNode
heartbeats (a quick signal) every three seconds, telling they’re active. (This
default value is configurable.) Every six hours (again, a configurable
default), the DataNodes reports the NameNode a block report outlining which
file blocks are on their nodes. In this manner, the NameNode always get updated
with a current view of the available resources in the cluster.
Leave Comment
1 Comments