Hadoop File System Commands: ls Command Output Analysis

In my previous post, I have explained various Hadoop file system commands, in which I also explained about the “ls command”. I have written the syntax and also describe it is used to get the HDFS file listing:

(From the previous post: Managing files with Hadoop File System Commands)

ls command :

Run the ls command to get an HDFS file listing:

                  $ hadoop hdfs dfs -ls .

                  Found 2 items

                  drwxr-xr-x - mindstick supergroup 0 2016-05-03 12:25 /user/mindstick

                  -rw-r--r-- 1 mindstick supergroup 118 2016-05-03 12:15 /user/mindstick/data.txt


But here, if we carefully observe the last two lines of the output (which I didn’t explain in my previous post):

               drwxr-xr-x - mindstick supergroup 0 2016-05-03 12:25 /user/mindstick

               -rw-r--r-- 1 mindstick supergroup 118 2016-05-03 12:15 /user/mindstick/data.txt

So, here we see and dig deep into the details of what this big expression is trying  to explain about the files.

For the ease of our comfort, we break down the file listing column-wise as in the following list:

Column 1:

It shows the file mode (“d” for directory and “–” for normal file, followed by the permissions). The three permission types — read (r), write (w), and execute (x) — are the same as we find on Linux- and Unix-based systems. The execute permission for a file is ignored because we cannot execute a file on HDFS. The permissions are grouped by owner, group, and public (everyone else).

Column 2:

It shows the replication factor for files. (The concept of replication doesn’t apply to directories.) The blocks that make up a file in HDFS are replicated to ensure fault tolerance. The replication factor, or the number of replicas that are kept for a specific file, is configurable. We can specify the replication factor when the file is created or later, via our application.

Columns 3 and 4:

 It shows the file owner and group. Supergroup is the name of the group of superusers, and a superuser is the user with the same identity as the NameNode process. If we start the NameNode, we are the superuser for now. This is a special group – regular users will have their user-ids belong to a group without special characteristics — a group that’s simply defined by a Hadoop administrator.

Column 5:

It shows the size of the file, in bytes, or 0 if it’s a directory.

Columns 6 and 7:

It show the date and time of the last modification, respectively.

Column 8:

It shows the unqualified name (meaning that the scheme name isn’t specified) of the file or directory.

  1. Keep sharing these types of articles.

    It was really helpful to read this post.

Leave Comment