the release of Hadoop 2, however, YARN was introduced, which open doors for whole
new world of data processing opportunities. Before Hadoop 2, Map Reduce is the
only way to process the data in distributed Hadoop Environment.
is acronym for Yet Another Resource Negotiator, it is a tool that enable other
data processing frameworks to run on Hadoop. A more substantive take on YARN
would describe it as a general-purpose resource management tool that can enabled
to schedule and assign CPU cycles and memory (and in the future, other
resources, such as network bandwidth) from the Hadoop cluster to waiting
applications. YARN raises exciting possibilities. Singlehandedly, YARN has
converted Hadoop from simply a batch processing engine into a platform for
several distinct modes of data processing, from traditional batch to interactive
queries to streaming analysis.
is meant to provide a more efficient and flexible workload scheduling as well as
resource management benefits, both of which will ultimately enable Hadoop to
run more than just MapReduce jobs.
the introduction of YARN, Hadoop architecture undergoes major changes at
certain level which make it more flexible and powerful.
Nothing has muh changed here with the transition from MapReduce to YARN — HDFS
is still work as the storage layer for Hadoop.
The key underlying concept in the transition to YARN from Hadoop 1 is
decoupling resource management from data processing. This decoupling enables
YARN to provide resources to any processing framework implemented for Hadoop,
Since YARN is a general-purpose resource management facility; it is able to
allocate cluster resources to all kinds of data processing framework
implemented for Hadoop. The processing framework then manages application
runtime issues more efficiently. To maintain and handle compatibility issues
for all the code that was developed for Hadoop 1, MapReduce acts as the first
framework available for use on YARN. At the present time, the Apache Tez
project act as an alternative framework
, which was an incubator project in development as for the execution of Pig and
Hive applications. Tez will popularly emerge as a standard Hadoop
Application Programming Interface (API):
With the support for additional processing frameworks, support for additional
APIs will come. At the time, Hoya (for running HBase on YARN), Apache Giraph
(for graph processing), Open MPI (for message passing in parallel systems), Apache
Storm (for data stream processing) are in active development.