In HDFS, the Data block size needs to be large enough to warrant the resources dedicated to an individual unit of data processing On the other hand, the block size can’t be so large that the system is waiting a very long time for a single unit of data processing to complete its work. Both of these recommendations but obvious depend on the kinds of work being done on the data blocks.
HDFS is designed to store data on inexpensive or less expensive, and much unreliable, hardware. Moreover, Inexpensive has an attractive ring to it from infrastructure point of view; more often it does raise concerns about the reliability of the system as a whole atomic unit, typically to make sure the high availability of the data. Planned ahead for disaster, the minds behind HDFS made the fact that to set up the system so that it would store three (count ’em — three) copies of every data block.
HDFS also assumes that each disk drive and each slave node is inherently unreliable, so very smartly, care must be taken in choosing where the three copies of the data blocks are stored. Below figure shows us how data blocks from a massive file are fragmented across the Hadoop cluster — meaning they are evenly distributed across the slave nodes in a way that a copy of the block will still be available regardless of disk, node, or rack failures.
The file shown here in the figure has five data blocks, labelled A, B, C, D and E. If we take a closer look, we can see two important things from that:
1. A particular cluster is comprised of two racks with two nodes at each.
2. And, the three copies (instances) of every data block have been spread out across the different slave nodes.
Each component in the Hadoop cluster is seen as a potential point of Failure, so when HDFS stores and distributes the replicas of the original blocks of the files across the Hadoop cluster, it tries to make sure that the block replicas are stored in different failure points.
For instance, take a careful observation at Block A. At the instance it needed to be stored, Slave Node 3 was considered, and the very first copy of Block A was stored there. For multiple rack systems, HDFS then identifies that the rest of the two copies of block A needed to be stored in a different rack. Hence, the second copy of block A is going to be stored on Slave Node 1. Now, the final copy can be stored on the same rack as the second copy, but not on the same slave node, so it gets stored on Slave Node 2.