Hadoop is primarily structured and designed to be deployed on a massive cluster of networked systems or nodes, featuring master nodes (which host the services that maintains Hadoop’s storage and manipulating power ) and slave nodes (where the data sets are stored and processed). We can, however, run Hadoop on a single computer, which is a great way to learn the basics of Hadoop by experimenting in a controlled space.
Hadoop has three deployment modes and deploy these components as follows: Local standalone mode, pseudo-distributed (single node) mode and fully distributed mode (clustered nodes).
Local standalone mode: This is the default mode if, we don't configure anything else. In this mode, all the components of Hadoop, such NameNode, DataNode, JobTracker, and TaskTracker, run in a single Java process.
Pseudo-distributed mode (single node): A single-node Hadoop deployment is considered as running Hadoop system in pseudo-distributed mode, in which all the Hadoop services, including both the master and the slave services, were executed on a single compute node. This kind of deployment is typically useful for running fast testing applications while we are developing these without having to think about using Hadoop cluster resources which might be needed by someone else. It’s also a easy way to experiment with Hadoop, as most of the people don’t have any cluster of computers at their disposal.
In this architecture, a separate JVM is spawned for every Hadoop components as they could communicate across network sockets, effectively producing a fully functioning and optimized mini-cluster on a single host.
Fully distributed mode (a cluster of nodes): A Hadoop deployment in which the Hadoop master and slave services run on a separate cluster of nodes is running in what’s stated as fully distributed mode. It is a perfect mode for production and development clusters. We can also make a further distinction since a development cluster generally has a lesser number of nodes and is used to prototype the workloads that will finally run on a production cluster.
In this architecture, Hadoop is designed to be distributed across multiple machines, some of which might act as general-purpose workers and others might be dedicated hosts for components, such as NameNode and JobTracker.
In software development, each mode has its benefits and drawbacks. Fully distributed mode is obviously the only one that can scale Hadoop across a cluster of machines, but it requires more configuration work, not to mention the cluster of machines. Local, or standalone, mode is the easiest to set up, but you interact with it in a different manner than you would with the fully distributed mode.