HDFS is one of the two main components of the Hadoop framework; the other is the computational paradigm known as
MapReduce. A distributed file system is a file system that manages storage across a networked cluster of machines.
HDFS stores data in blocks, units whose default size is 64MB. Files that we want stored in HDFS need to be broken into block-size chunks that are then stored independently throughout the cluster.
We can use the fsck line command to list the blocks that make up each file in HDFS, as follows:
% hadoop fsck / -files –blocks
Hadoop System Shell:
We access the Hadoop file system shell by running one form of the hadoop command. All hadoop commands are invoked by the bin/hadoop script. (To retrieve a description of all hadoop commands, run the hadoop script without specifying any arguments.) The hadoop command has the syntax:
hadoop [--config confdir] [COMMAND] [GENERIC_OPTIONS] [COMMAND_OPTIONS]
The --config confdir option overwrites the default configuration directory ($HADOOP_HOME/conf), so we can easily customize our Hadoop environment configuration. The generic options and command options are a common set of options that are supported by several commands.
Hadoop file system shell commands (for command line interfaces) take uniform resource identifiers (URIs) as arguments. A URI is a string of characters that’s used to identify a name or a web resource. The string can include a scheme name — a qualifier for the nature of the data source. For HDFS, the scheme name is hdfs, and for the local file system, the scheme name is file. If we don’t specify a scheme name, the default is the scheme name that’s specified in the configuration file. A file or directory in HDFS can be specified in a fully qualified way, such as in this example:
Or it can simply be /parent/child if the configuration files points to
The Hadoop file system shell commands, which are similar to Linux file commands, have the following general syntax:
hadoop hdfs -file_cmd
As we might expect, we use the mkdir command to create a directory in HDFS, just as we would do on Linux or on Unix-based operating systems. Though HDFS has a default working directory, /user/$USER, where $USER is our login username, you need to create it yourself by using the syntax:
$ hadoop hdfs dfs -mkdir /user/login_user_name
For example, to create a directory named “mindstick”, run this mkdir command:
$ hadoop hdfs dfs -mkdir /user/mindstick
Use the Hadoop put command to copy a file from our local file system to HDFS:
$ hadoop hdfs dfs -put file_name /user/login_user_name
For example, to copy a file named data.txt to this new directory, run the following put command:
$ hadoop hdfs dfs -put data.txt /user/mindstick
ls command :
Run the ls command to get an HDFS file listing:
- $ hadoop hdfs dfs -ls .
- Found 2 items
- drwxr-xr-x - mindstick supergroup 0 2016-05-03 12:25 /user/mindstick
- -rw-r--r-- 1 mindstick supergroup 118 2016-05-03 12:15 /user/mindstick/data.txt
Use the Hadoop get command to copy a file from HDFS to our local file system: