Managing files with Hadoop File System Commands

HDFS is one of the two main components of the Hadoop framework; the other is the computational paradigm known as MapReduce. A distributed file system is a file system that manages storage across a networked cluster of machines.

HDFS stores data in blocks, units whose default size is 64MB. Files that we want stored in HDFS need to be broken into block-size chunks that are then stored independently throughout the cluster.

fsck command:

We can use the fsck line command to list the blocks that make up each file in HDFS, as follows:

% hadoop fsck / -files –blocks

Hadoop System Shell:

We access the Hadoop file system shell by running one form of the hadoop command. All hadoop commands are invoked by the bin/hadoop script. (To retrieve a description of all hadoop commands, run the hadoop script without specifying any arguments.) The hadoop command has the syntax:

hadoop [--config confdir] [COMMAND] [GENERIC_OPTIONS] [COMMAND_OPTIONS]

The --config confdir option overwrites the default configuration directory ($HADOOP_HOME/conf), so we can easily customize our Hadoop environment configuration. The generic options and command options are a common set of options that are supported by several commands.

Hadoop file system shell commands (for command line interfaces) take uniform resource identifiers (URIs) as arguments. A URI is a string of characters that’s used to identify a name or a web resource. The string can include a scheme name — a qualifier for the nature of the data source. For HDFS, the scheme name is hdfs, and for the local file system, the scheme name is file. If we don’t specify a scheme name, the default is the scheme name that’s specified in the configuration file. A file or directory in HDFS can be specified in a fully qualified way, such as in this example:

hdfs://namenodehost/parent/child

Or it can simply be /parent/child if the configuration files points to

hdfs://namenodehost.

The Hadoop file system shell commands, which are similar to Linux file commands, have the following general syntax:

hadoop hdfs -file_cmd

mkdir command:

As we might expect, we use the mkdir command to create a directory in HDFS, just as we would do on Linux or on Unix-based operating systems. Though HDFS has a default working directory, /user/$USER, where $USER is our login username, you need to create it yourself by using the syntax:

$ hadoop hdfs dfs -mkdir /user/login_user_name

For example, to create a directory named “mindstick”, run this mkdir command:

$ hadoop hdfs dfs -mkdir /user/mindstick

put command:

Use the Hadoop put command to copy a file from our local file system to HDFS:

$ hadoop hdfs dfs -put file_name /user/login_user_name

For example, to copy a file named data.txt to this new directory, run the following put command:

$ hadoop hdfs dfs -put data.txt /user/mindstick

ls command :

Run the ls command to get an HDFS file listing:

$ hadoop hdfs dfs -ls .
Found 2 items
drwxr-xr-x - mindstick supergroup 0 2016-05-03 12:25 /user/mindstick
-rw-r--r-- 1 mindstick supergroup 118 2016-05-03 12:15 /user/mindstick/data.txt

get Command:

Use the Hadoop get command to copy a file from HDFS to our local file system:

articles