HDFS
is one of the two main components of the Hadoop framework; the other is the
computational paradigm known as MapReduce. A distributed file system is a file
system that manages storage across a networked cluster of machines.
HDFS
stores data in blocks, units whose default size is 64MB. Files that we want
stored in HDFS need to be broken into block-size chunks that are then stored
independently throughout the cluster.
fsck command:
We can use the fsck line command to list the blocks that make up each file in
HDFS, as follows:
% hadoop fsck / -files –blocks
Hadoop System Shell:
We access the Hadoop file system shell by running one form of the hadoop command. All hadoop commands are invoked by the bin/hadoop script. (To retrieve a description of all hadoop commands, run the hadoop script without specifying any arguments.) The hadoop command has the syntax:
hadoop [--config confdir] [COMMAND] [GENERIC_OPTIONS] [COMMAND_OPTIONS]
The --config confdir option overwrites the
default configuration directory ($HADOOP_HOME/conf),
so we can easily customize our Hadoop environment configuration. The generic options
and command options are a common set of options that are supported by several
commands.
Hadoop
file system shell commands (for command line interfaces) take uniform resource
identifiers (URIs) as arguments. A URI
is a string of characters that’s used to identify a name or a web resource. The
string can include a scheme name — a qualifier for the nature of the data
source. For HDFS, the scheme name is hdfs,
and for the local file system, the scheme name is file. If we don’t specify a
scheme name, the default is the scheme name that’s specified in the
configuration file. A file or directory in HDFS can be specified in a fully
qualified way, such as in this example:
hdfs://namenodehost/parent/child
Or
it can simply be /parent/child if the configuration files points to
hdfs://namenodehost.
The
Hadoop file system shell commands, which are similar to Linux file commands, have
the following general syntax:
hadoop hdfs -file_cmd
mkdir command:
As
we might expect, we use the mkdir
command to create a directory in HDFS, just as we would do on Linux or on
Unix-based operating systems. Though HDFS has a default working directory,
/user/$USER, where $USER is our login username, you need to create it yourself
by using the syntax:
$
hadoop hdfs dfs -mkdir /user/login_user_name
For
example, to create a directory named “mindstick”, run this mkdir command:
$ hadoop hdfs dfs -mkdir /user/mindstick
put command:
Use
the Hadoop put command to copy a
file from our local file system to HDFS:
$
hadoop hdfs dfs -put file_name /user/login_user_name
For
example, to copy a file named data.txt to this new directory, run the following
put command:
$ hadoop hdfs dfs -put data.txt /user/mindstick
ls command :
Run the ls command to get an HDFS file listing:
- $ hadoop hdfs dfs -ls .
- Found 2 items
- drwxr-xr-x - mindstick supergroup 0 2016-05-03 12:25 /user/mindstick
- -rw-r--r-- 1 mindstick supergroup 118
2016-05-03 12:15 /user/mindstick/data.txt
get Command:
Use
the Hadoop get command to copy a file from HDFS to our local file system:
Leave Comment
2 Comments