When we are choosing storage options, consider the impact of using commodity drives rather than more expensive enterprise-quality drives. Imagine that we have a 750-node cluster, where each node has 12 hard disk drives dedicated to HDFS storage. On the bases of an annual usual failure rate (AFR) of about 4% for commodity disk drives (a given hard disk drive has approximate 4% probability of failing in a given year, in other words), our cluster will likely a hard disk crash every day of the year.
Since there can be too many of slave nodes, their failure is also a very usual occurrence in massive clusters with hundreds or more nodes. With this kind of stats in mind, HDFS has been designed on an assumption that almost all hardware components, even the components at the slave node level, are considered unreliable. HDFS resolves the unreliability of particular hardware components by way of redundancy: That’s the logic behind those three replicas of each file stored in HDFS, distributed throughout the system. More typically, every file block stored in HDFS has a total of three replicas. If a system breaks with a specific file block that we need, we can immediately turned to the other two.
To manage such key factors as total ownership cost, storage capacity, and performance, the design of the salve nodes must needs to be designed very carefully.
We commonly see slave nodes now where each node typically has between 12 and 16 locally attached 3TB hard disk drives. Slave nodes use moderately fast dual-socket CPUs having six to eight cores in each piece— no speed demons, in simple words. This is supported by 48GB of RAM. To conclude, this server is optimized for very dense storage.
Since HDFS is a user-space-level file system, it’s a must do task to optimize the local file system on the slave nodes to work with HDFS. In this regard, one high-impact decision when setting up our servers is to consider a file system with the Linux installation on the slave nodes. Ext3 is a very commonly deployed file system since it has been the most stable choice for a number of years. Also lets see Ext4, It’s the next and advanced version of Ext3, and it has been available long enough to be widely considered much stable and reliable. Most important thing for our purposes is that, it consists of a number of optimizations for handling massive files, which makes the perfect choice for HDFS slave node servers.