Data Replication in Hadoop: Slave node disk failures (Part -2)
In Part -1, I explained about Data Replication in Hadoop: Replicating Data Blocks (Part – 1) now in this post I am trying to explain about Slave node disk failures.
was originally designed with an intention to store petabyte data at the scale,
with any Potential limitations to scaling out are minimized. The huge block
size is a direct result of this requirement to store data on a very large
scale. Firstly, each data block stored in HDFS consists of its own metadata and
requires to be tracked down by a central server in way that the applications
needs to access a particular file can be directed to the place where all the
blocks of file are stored. If the size
of that block were in the KB’s range, even modest data volume in the TB scale
would overwhelm the metadata server with lots of blocks to track. Secondly,
HDFS is built to provide high throughput so that the parallel processing of
these massive volume of data sets happens as fast as possible.
most important thing to Hadoop’s scalability on the data processing side is, and
will always be, parallelism — it is the ability of processing the individual
blocks of these massive files in parallel. To facilitate efficient processing,
a balance needs to be struck.
like death and taxes, disk failures (and given enough time, even node or rack
failures), are inevitable. Given the example in the adjoining figure, even if
one of the racks were to fail, the cluster would still continue functioning. Performance
would suffer since we would lost half our processing resources, but the system
will still online and the data will be still available.
this situation, where a disk drive or a slave node fail to perform, the central
metadata server for HDFS (called the NameNode) eventually finds it instantly
that the file blocks which are stored on
the failed resource are no longer available. For example, if Slave Node 3 in
this figure fails, it would mean that Blocks A, C, and D are under-replicated.
In other words, too few copies of these blocks are now available in HDFS. When HDFS
depicts that a block is under-replicated, it orders a new copy.
continue the example, let’s say that Slave Node 3 comes back online after a few
hours. At the meantime, HDFS will ensure that there are now three copies of all
the file blocks. So now, Blocks A, C, and D have four copies apiece and are over-replicated.
As like under-replicated blocks, the HDFS central metadata server will sense
about this as well, and will instantly order one copy of every file to be
nice output of the availability of data is that when disk failures do occur,
there’s no need to immediately replace failed hard drives. This can more
effectively be done at regularly scheduled intervals.