Design and its working in HDFS
a user tries to stores a file in HDFS, the file is first break down into data blocks, and three replicas of these
data blocks are stored in slave nodes (data nodes) throughout the Hadoop
cluster. That’s a lot of data blocks to keep track of. The NameNode acts as the address book for HDFS
because it knows not only which blocks make up individual files but apart from that it also knows where each of
these blocks and their replicas are stored. As you might expect, knowing exactly where the bodies are buried makes the
NameNode a critically key component in a Hadoop cluster. If the NameNode is not
available, applications cannot be able to access any data stored in HDFS.
we take a look at the adjoining figure, we can observe that the NameNode daemon
executes on a master node server. All mapping information related to the data
blocks and their corresponding files is stored in a file named fsimage.
HDFS is a journaling file system, which means that whenever data changes are
logged in an edit journal,it tracks events from the last checkpoint — it is the
last time when the edit log was merged with fsimage. In HDFS, the edit journal
is maintained in a file named edits that’s stored on the NameNode.
how the NameNode do its work, it is very wise to carefully look at how it
starts up. Since the purpose of the NameNode is to communicate applications of
how many data blocks they require to process and also to keep track of the
correct location where they’re stored, for that all the data block locations and block-to-file mappings
that are available in RAM is needed. These are few steps the NameNode takes to
load all the information that it needs immediately after it starts up, the
The NameNode loads the
fsimage file into memory.
2. The NameNode loads the edits
file and re-plays the journaled changes to update the block metadata that’s
already in memory.
The DataNode daemons send
the block reports to NameNode.
every slave node (data node), there’s a block report that enlists all of the
data blocks stored in it and also describes the health of each one.
the completion of the startup process, the NameNode has a complete view of all
the data stored in HDFS, and it’s now perfectly ready to receive application
requests from Hadoop clients. As data files are added or removed on the basis
of client requests, the changes are instantly written to the slave node’s disk
volumes, journal updates are written to the edits file, and the final changes are reflected in the block
locations and metadata stored in the NameNode’s memory
the entire life-cycle of the cluster, the DataNode daemons report the NameNode
heartbeats (a quick signal) every three seconds, telling they’re active. (This
default value is configurable.) Every six hours (again, a configurable
default), the DataNodes reports the NameNode a block report outlining which
file blocks are on their nodes. In this manner, the NameNode always get updated
with a current view of the available resources in the cluster.