articles

Home / DeveloperSection / Articles / Check pointing process in HDFS

Check pointing process in HDFS

Jayden Bell1904 04-May-2016

As we already know now that HDFS is a journaled file system, where new changes to files in HDFS are captured in an edit log that’s stored on the NameNode in a file named edits. After a certain interval of time, when the edits file comes to reach a certain threshold or after a particular period of time has elapsed, the journaled entries are required to be committed to the master fsimage file. The NameNode by itself doesn’t do this task, since it is designed to respond application requests as fast as it can. But more importantly, considerable amount of risk is involved in having this metadata update operation handled by a single master server.

If the metadata expalins the mappings between the data blocks and their corresponding data files becomes corrupted,then the original data is as good as lost.

Check pointing services for a Hadoop cluster are managed by one of the four possible daemons in the particular node, which need to execute on their own dedicated master node along with the NameNode daemon’s master node:

1.  Secondary NameNode: before version Hadoop 2, this was the only check pointing daemon node, performing the check pointing process explained in the above situation. This Secondary NameNode has a notoriously inaccurate name since it is in no way “secondary” and a “standby” for the NameNode.

2. Checkpoint Node: The Checkpoint Node is the alternative replacement for the Secondary NameNode. It only performs check pointing and nothing more.

3.  Backup Node: It enables check pointing service, and also maintains a backup of the fsimage and edits file.

4.   Standby NameNode: It facilities check pointing service but, unlike the old Secondary NameNode, the Standby NameNode is a true standby server,providing a hot-swap of the NameNode process to avoid any downtime.

The described steps, explains the check pointing process as it’s carried out by the NameNode and the checkpointing service (note that four possible daemons can be used for checkpointing — see above):

1.    When it’s time to perform the checkpoint, the NameNode creates a new file to accept the journaled file system changes. It names the new file edits.new.

2.    As a result, the edits file accepts no further changes and is copied to the checkpointing service, along with the fsimage file.

3.    The checkpointing service merges these two files, creating a file named fsimage.ckpt.

4.    The checkpointing service copies the fsimage.ckpt file to the NameNode.

5.    The NameNode overwrites the file fsimage with fsimage.ckpt.

6.    The NameNode renames the edits.new file to edits.


Leave Comment

Comments

Liked By