we are choosing storage options, consider the impact of using commodity drives
rather than more expensive enterprise-quality drives. Imagine that we have a
750-node cluster, where each node has 12 hard disk drives dedicated to HDFS
storage. On the bases of an annual usual failure rate (AFR) of about 4% for
commodity disk drives (a given hard disk drive has approximate 4% probability
of failing in a given year, in other words), our cluster will likely a hard disk crash every day of the year.
there can be too many of slave nodes, their failure is also a very usual
occurrence in massive clusters with hundreds or more nodes. With this kind of
stats in mind, HDFS has been designed on an assumption that almost all hardware
components, even the components at the slave node level, are considered
unreliable. HDFS resolves the unreliability of particular hardware components
by way of redundancy: That’s the logic behind those three replicas of each file
stored in HDFS, distributed throughout the system. More typically, every file
block stored in HDFS has a total of three replicas. If a system breaks with a
specific file block that we need, we can immediately turned to the other two.
manage such key factors as total ownership cost, storage capacity, and
performance, the design of the salve nodes must needs to be designed very
commonly see slave nodes now where each node typically has between 12 and 16
locally attached 3TB hard disk drives. Slave nodes use moderately fast
dual-socket CPUs having six to eight cores in each piece— no speed demons, in
simple words. This is supported by 48GB of RAM. To conclude, this server is
optimized for very dense storage.
HDFS is a user-space-level file system, it’s a must do task to optimize the
local file system on the slave nodes to work with HDFS. In this regard, one
high-impact decision when setting up our servers is to consider a file system
with the Linux installation on the slave nodes. Ext3 is a very commonly
deployed file system since it has been the most stable choice for a number of
years. Also lets see Ext4, It’s the next and advanced version
of Ext3, and it has been available long enough to be widely considered much stable
and reliable. Most important thing for our purposes is that, it consists of a
number of optimizations for handling massive files, which makes the perfect
choice for HDFS slave node servers.