Local and Distributed Modes of Running Pig Script
Pig has two modes for running scripts: Local Mode and MapReduce Mode:
All scripts are run on a single machine without requiring Hadoop MapReduce and HDFS. This can be useful for developing and testing Pig logic. If we are using a small set of data to develop or test our code, then local mode could be much quicker than going through the MapReduce infrastructure. Local mode doesn’t need Hadoop. When we run in Local mode, the Pig program executes in the context of a local Java Virtual Machine (JVM), and data access is through the local file system of a single machine. Local mode is basically a local simulation of MapReduce in Hadoop’s LocalJobRunner class.
MapReduce mode (also called as Hadoop mode):
Pig always runs on the Hadoop cluster. In this mode, the Pig script gets translated into a series of MapReduce tasks that are then execute on the Hadoop cluster.
If we have a terabyte (TB) of data that we want to perform operations on and we want to interactively develop a program, we may soon find things slowing down considerably, and we may start growing your storage. Local mode allows us to work with a subset of our data in a more interactive manner so that we can figure out the logic (and work out the bugs) of our Pig program. After we have things set up as we want them and our operations are running smoothly, we can then run the script against the full data set using MapReduce mode.
Pig Script Interfaces
Pig programs can be packaged in three different ways:
Script: This function is nothing more than a file consists of Pig Latin commands, identified by the .pig suffix (MindStick.pig, for example). Ending our Pig program with the .pig extension is a convention but not required. The commands are interpreted by the Pig Latin compiler and then runs in the order determined by the Pig optimizer.
Grunt: Grunt acts as a command interpreter where we can interactively enter Pig Latin at the Grunt command line and immediately see the response. This method is useful for prototyping during early development stage and with what-if scenarios.
Pig scripts, Grunt shell Pig commands, and embedded Pig programs can be executed in either Local mode or on MapReduce mode. The Grunt shell enables an interactive shell to submit Pig commands and run Pig scripts. To start the Grunt shell in Interactive mode, we need to submit the command pig at the shell.
To tell the complier whether a script or Grunt shell is executed locally or in Hadoop mode just specify it in the –x flag to the pig command. The following is an example of how we would specify running our Pig script in local mode:
pig -x local mindStick.pig
Here’s how we would run the Pig script in Hadoop mode, which is the default if we don’t specify the flag:
pig -x mapreduce mindstick.pig
By default, when we specify the pig command without any parameters, it starts the Grunt shell in Hadoop mode. If we want to start the Grunt shell in local mode just add the –x local flag to the command. Here is an example: