Hadoop File System Commands: ls Command Output Analysis
my previous post, I have explained various Hadoop file system commands, in
which I also explained about the “ls command”. I have written the syntax and
also describe it is used to get the HDFS file listing:
(From the previous post: Managing files
with Hadoop File System Commands)
the ls command to get an HDFS file
$ hadoop hdfs dfs -ls .
Found 2 items
drwxr-xr-x - mindstick supergroup 0 2016-05-03
-rw-r--r-- 1 mindstick supergroup 118
2016-05-03 12:15 /user/mindstick/data.txt
here, if we carefully observe the last two lines of the output (which I didn’t explain
in my previous post):
drwxr-xr-x - mindstick supergroup 0 2016-05-03 12:25 /user/mindstick
-rw-r--r-- 1 mindstick supergroup 118 2016-05-03 12:15
here we see and dig deep into the details of what this big expression is trying
to explain about the files.
the ease of our comfort, we break down the file listing column-wise as in the
shows the file mode (“d” for directory and “–” for normal file,
followed by the permissions). The three permission types — read
(r), write (w), and execute (x) — are the same as we find on Linux- and
Unix-based systems. The execute permission for a file is ignored because we
cannot execute a file on HDFS. The permissions are grouped by owner, group, and
public (everyone else).
shows the replication factor for files.
(The concept of replication doesn’t apply to directories.) The blocks that
make up a file in HDFS are replicated to ensure fault tolerance. The
replication factor, or the number of replicas that are kept for a specific file,
is configurable. We can specify the replication factor when the file is created
or later, via our application.
3 and 4:
It shows the file owner and group. Supergroup is the name of the group of
superusers, and a superuser is the user with the same identity as the NameNode
process. If we start the NameNode, we are the superuser for now. This is a
special group – regular users will have their user-ids belong to a group
without special characteristics — a group that’s simply defined by a Hadoop administrator.
the size of the file, in bytes, or 0
if it’s a directory.
6 and 7:
the date and time of the last
the unqualified name (meaning that
the scheme name isn’t specified) of the file or directory.