Managing files with Hadoop File System Commands

HDFS is one of the two main components of the Hadoop framework; the other is the computational paradigm known as MapReduce. A distributed file system is a file system that manages storage across a networked cluster of machines.

HDFS stores data in blocks, units whose default size is 64MB. Files that we want stored in HDFS need to be broken into block-size chunks that are then stored independently throughout the cluster.


fsck command:

 We can use the fsck line command to list the blocks that make up each file in HDFS, as follows:

                         % hadoop fsck / -files –blocks


Hadoop System Shell:

We access the Hadoop file system shell by running one form of the hadoop command. All hadoop commands are invoked by the bin/hadoop script. (To retrieve a description of all hadoop commands, run the hadoop script without specifying any arguments.) The hadoop command has the syntax:


      hadoop [--config confdir] [COMMAND] [GENERIC_OPTIONS] [COMMAND_OPTIONS]


The --config confdir option overwrites the default configuration directory ($HADOOP_HOME/conf), so we can easily customize our Hadoop environment configuration. The generic options and command options are a common set of options that are supported by several commands.

Hadoop file system shell commands (for command line interfaces) take uniform resource identifiers (URIs) as arguments. A URI is a string of characters that’s used to identify a name or a web resource. The string can include a scheme name — a qualifier for the nature of the data source. For HDFS, the scheme name is hdfs, and for the local file system, the scheme name is file. If we don’t specify a scheme name, the default is the scheme name that’s specified in the configuration file. A file or directory in HDFS can be specified in a fully qualified way, such as in this example:


Or it can simply be /parent/child if the configuration files points to


The Hadoop file system shell commands, which are similar to Linux file commands, have the following general syntax:

              hadoop hdfs -file_cmd


mkdir command:

As we might expect, we use the mkdir command to create a directory in HDFS, just as we would do on Linux or on Unix-based operating systems. Though HDFS has a default working directory, /user/$USER, where $USER is our login username, you need to create it yourself by using the syntax:

         $ hadoop hdfs dfs -mkdir /user/login_user_name

For example, to create a directory named “mindstick”, run this mkdir command:

       $ hadoop hdfs dfs -mkdir /user/mindstick


put command:

Use the Hadoop put command to copy a file from our local file system to HDFS:

         $ hadoop hdfs dfs -put file_name /user/login_user_name

For example, to copy a file named data.txt to this new directory, run the following put command:

         $ hadoop hdfs dfs -put data.txt /user/mindstick

ls command :

Run the ls command to get an HDFS file listing:

                       $ hadoop hdfs dfs -ls .

                         Found 2 items

                      drwxr-xr-x - mindstick supergroup 0 2016-05-03 12:25 /user/mindstick

                     -rw-r--r-- 1 mindstick supergroup 118 2016-05-03 12:15 /user/mindstick/data.txt

get Command:

Use the Hadoop get command to copy a file from HDFS to our local file system:

  1. Nice article. You made it very precise and explained well.

  1. Good work sir, Thanks for the proper explanation about <a href="">Hadoop shell commands</a> . I found one of the good resource related <a href="">hadoop fs commands </a> and <a href="">hadoop tutorial</a>. It is providing in-depth knowledge on <a href=""> hadoop fs commands </a> and <a href="">hadoop tutorial</a>. which I am sharing a link with you where you can get more clear on <a href=""> hadoop fs commands </a> and <a href="">hadoop tutorial</a>. To know more Just have a look at this link 

    <a href="">Hadoop Tutorial</a>

    <a href="">Hadoop fs Commands</a>

Leave Comment