With the release of Hadoop 2, however, YARN was introduced, which open doors for whole new world of data processing opportunities. Before Hadoop 2, Map Reduce is the only way to process the data in distributed Hadoop Environment.

YARN is acronym for Yet Another Resource Negotiator, it is a tool that enable other data processing frameworks to run on Hadoop. A more substantive take on YARN would describe it as a general-purpose resource management tool that can enabled to schedule and assign CPU cycles and memory (and in the future, other resources, such as network bandwidth) from the Hadoop cluster to waiting applications. YARN raises exciting possibilities. Singlehandedly, YARN has converted Hadoop from simply a batch processing engine into a platform for several distinct modes of data processing, from traditional batch to interactive queries to streaming analysis.

YARN is meant to provide a more efficient and flexible workload scheduling as well as resource management benefits, both of which will ultimately enable Hadoop to run more than just MapReduce jobs.

With the introduction of YARN, Hadoop architecture undergoes major changes at certain level which make it more flexible and powerful. 

Distributed storage: Nothing has muh changed here with the transition from MapReduce to YARN — HDFS is still work as the storage layer for Hadoop.

Resource management: The key underlying concept in the transition to YARN from Hadoop 1 is decoupling resource management from data processing. This decoupling enables YARN to provide resources to any processing framework implemented for Hadoop, including MapReduce.

Processing framework: Since YARN is a general-purpose resource management facility; it is able to allocate cluster resources to all kinds of data processing framework implemented for Hadoop. The processing framework then manages application runtime issues more efficiently. To maintain and handle compatibility issues for all the code that was developed for Hadoop 1, MapReduce acts as the first framework available for use on YARN. At the present time, the Apache Tez project  act as an alternative framework , which was an incubator project in development as for the execution of Pig and Hive applications. Tez will popularly emerge as a standard Hadoop configuration.

Application Programming Interface (API): With the support for additional processing frameworks, support for additional APIs will come. At the time, Hoya (for running HBase on YARN), Apache Giraph (for graph processing), Open MPI (for message passing in parallel systems), Apache Storm (for data stream processing) are in active development.

  Modified On Dec-16-2017 01:14:43 AM

Leave Comment