Users Pricing

articles

home / developersection / articles / managing files with hadoop file system commands

Managing files with Hadoop File System Commands

David Miller 4397 05 May 2016 Updated 28 Nov 2017

HDFS is one of the two main components of the Hadoop framework; the other is the computational paradigm known as MapReduce. A distributed file system is a file system that manages storage across a networked cluster of machines.

HDFS stores data in blocks, units whose default size is 64MB. Files that we want stored in HDFS need to be broken into block-size chunks that are then stored independently throughout the cluster. 

fsck command:

 We can use the fsck line command to list the blocks that make up each file in HDFS, as follows:

    % hadoop fsck / -files –blocks 

Hadoop System Shell:

We access the Hadoop file system shell by running one form of the hadoop command. All hadoop commands are invoked by the bin/hadoop script. (To retrieve a description of all hadoop commands, run the hadoop script without specifying any arguments.) The hadoop command has the syntax: 

      hadoop [--config confdir] [COMMAND] [GENERIC_OPTIONS] [COMMAND_OPTIONS] 

The --config confdir option overwrites the default configuration directory ($HADOOP_HOME/conf), so we can easily customize our Hadoop environment configuration. The generic options and command options are a common set of options that are supported by several commands.

Hadoop file system shell commands (for command line interfaces) take uniform resource identifiers (URIs) as arguments. A URI is a string of characters that’s used to identify a name or a web resource. The string can include a scheme name — a qualifier for the nature of the data source. For HDFS, the scheme name is hdfs, and for the local file system, the scheme name is file. If we don’t specify a scheme name, the default is the scheme name that’s specified in the configuration file. A file or directory in HDFS can be specified in a fully qualified way, such as in this example:

             hdfs://namenodehost/parent/child

Or it can simply be /parent/child if the configuration files points to

             hdfs://namenodehost.

The Hadoop file system shell commands, which are similar to Linux file commands, have the following general syntax:

              hadoop hdfs -file_cmd 

mkdir command:

As we might expect, we use the mkdir command to create a directory in HDFS, just as we would do on Linux or on Unix-based operating systems. Though HDFS has a default working directory, /user/$USER, where $USER is our login username, you need to create it yourself by using the syntax:

         $ hadoop hdfs dfs -mkdir /user/login_user_name

For example, to create a directory named “mindstick”, run this mkdir command:

       $ hadoop hdfs dfs -mkdir /user/mindstick 

put command:

Use the Hadoop put command to copy a file from our local file system to HDFS:

   $ hadoop hdfs dfs -put file_name /user/login_user_name

For example, to copy a file named data.txt to this new directory, run the following put command:

     $ hadoop hdfs dfs -put data.txt /user/mindstick

ls command :

Run the ls command to get an HDFS file listing:

  •   $ hadoop hdfs dfs -ls .
  •    Found 2 items
  •    drwxr-xr-x - mindstick supergroup 0 2016-05-03 12:25 /user/mindstick
  •    -rw-r--r-- 1 mindstick supergroup 118 2016-05-03 12:15 /user/mindstick/data.txt
get Command:

Use the Hadoop get command to copy a file from HDFS to our local file system:


David Miller

Other


2 Comments