Three modes of Hadoop Cluster Architecture
is primarily structured and designed to
be deployed on a massive cluster of networked systems or nodes, featuring
master nodes (which host the services that
maintains Hadoop’s storage and manipulating power ) and slave nodes
(where the data sets are stored and
processed). We can, however, run Hadoop on a single computer, which is a great
way to learn the basics of Hadoop by experimenting in a controlled space.
has three deployment modes and deploy these components as follows: Local
standalone mode, pseudo-distributed (single node) mode and fully distributed
mode (clustered nodes).
Local standalone mode:
This is the default mode if, we don't configure anything else. In this mode,
all the components of Hadoop, such NameNode, DataNode, JobTracker, and
TaskTracker, run in a single Java process.
Pseudo-distributed mode (single node):
A single-node Hadoop deployment is considered as running Hadoop system in
pseudo-distributed mode, in which all the Hadoop services, including both the
master and the slave services, were executed on a single compute node. This
kind of deployment is typically useful for running fast testing applications
while we are developing these without having to think about using Hadoop
cluster resources which might be needed by someone else. It’s also a easy way
to experiment with Hadoop, as most of the people don’t have any cluster of
computers at their disposal.
this architecture, a separate JVM is spawned for every Hadoop components as
they could communicate across network sockets, effectively producing a fully
functioning and optimized mini-cluster on a single host.
Fully distributed mode (a cluster of nodes):
A Hadoop deployment in which the Hadoop master and slave services run on a
separate cluster of nodes is running in what’s stated as fully distributed
mode. It is a perfect mode for production and development clusters. We can also
make a further distinction since a development cluster generally has a lesser
number of nodes and is used to prototype the workloads that will finally run on
a production cluster.
this architecture, Hadoop is designed to be distributed across multiple
machines, some of which might act as general-purpose workers and others might be
dedicated hosts for components, such as NameNode and JobTracker.
software development, each mode has its benefits and drawbacks. Fully
distributed mode is obviously the only one that can scale Hadoop across a
cluster of machines, but it requires more configuration work, not to mention
the cluster of machines. Local, or standalone, mode is the easiest to set up,
but you interact with it in a different manner than you would with the fully