Episode 011: du – disk usage

The du command provides a summary of disk usage for files and directories. The default behavior is to show the amount of blocks used by the contents of a directory or directories the command is run on. Usage is calculated recursively for directories. When du encounters a directory it will recurse into subdirectories and show the disk utilization of the files and directories under that directory and then present a total for the top most directory. This cascades down through each subdirectory where the subdirectory becomes the parent and each child directory is summarized and the parent then totaled. For instance, this screen shot below shows an example of the du command on a directory that contains subdirectories:

du screen shot 1

The default values are shown in units of 1024 bytes (1 kilobyte). This value can be adjusted using the -B or –block-size= option:

  • -BK -k –block-size=k, –block-size=1024 = display values in kilobytes (1024 bytes)
  • -BM -m –block-size=m, –block-size=1048576 = display values in megabytes
  • -BG –block-size=g = display values in gigabytes
  • -BT –block-size=t = display values in terabytes
  • -BP –block-size=p = display values in petabytes
  • -BE –block-size=e = display values in exabytes
  • -BZ –block-size=z = display values in zetabytes
  • -BY –block-size=y = display values in yottabytes

The actual use of these values varies depending no your system and storage capabilities. Trying to display values in zetabytes may produce the error:

-B arguemnt ‘Z’ too large

When you do not have a zetabye of space to begin with. Similarly, displaying values far greater that the amount of storage the object actually uses may report a value of 1. For instance, running:

du -BG some_file

Where “some_file” is less that 1 gigabyte will report a 1G for the object even though the file may actually be only a few kilobytes in size. So make an effort to stick with values that are reasonable for the file system you are reading. On newer versions of du there is the -h or –human-readable switch which will display the values of du to the closest 1024 or less unit representation. That is:

  • 1-1023 KB = kilobytes
  • 1024+ KB = megabytes
  • 1024+ MB = Gigabytes
  • 1024+ GB = Terabytes

The letter representing the unit displayed will be appended to the end of the amount.

The default behavior of du shows the totals for directories and not individual files. The -a or –all switch will report counts for all directories and files recursively.

The total usage is displayed on the last line of output by default. If you want to only see this amount instead of all the values for each directory or file the -s or –summarize is the flag to use:

du -sh

This will display the disk usage total for the current directory. If you are just looking at the values of a few files:

du -h file1 file2 file3

The values will be shown for each file but no total will be calculated. To generate the total in this case use the -c or –total options:

du -hc file1 file2 file3

This will display the usage for each file and then a total for all three files on the last line all in human readable units.

The total of du can report just the files in the current location and not the values of subdirectories with the -S or –separate-dirs option. This will report the du values normally but the total usage displayed on the last line will not include the values of any subdirectories in that location. For instance:

du -S somedir

In some dir are the following files and directories:

file1 5K
file2 8K
directory1/file1 10K
directory1/files2 30K

The resulting output would be

40K ./directory
13K .

instead of:

40K ./directory
53K .

If just `du somedir` was run.

Recursion depth can be controlled with the -d or –max-depth= flag. If you specify 0 as the depth this is the same as -s or –summarize. What this does is control what displayed, it does no alter the values. The list of values will only display a recursion up to the max depth. That is, if a max depth of 1 is specified it will only show the current directory and one level of subdirectories but it will report the usage values normally. If you have a directory that contains a child directory which has two child directories underneath it, only the first child directory will be reported in the output instead of all first child directory and then an entry for each directory under the child. But in either case, the total usage for the child directory will be reported as the same and the toal usage over all will be the same.

du screen shot 1

Compared with –max-depth=1

du screenshot 2

The –exclude= option exludes any directories or files matching the listed pattern:

du –exclude=”*.txt”

Would exclude any file with the “.txt” extension from be counted in the usage values. If you need to specify a number of different exclude rules put them in a text file and call that file with the -X or –exclude-from= flag:

du –exclude-from=excludefile

The output of du is a list with each entry separated by a new line. You can change the newline to a 0 byte using the -0 or –null option and this will output the values on a single line separated by a 0 byte.

The du command operates on blocks of storage not on the actual space used by the object. Blocks are representations on how data is stored on a storage device like a disk. Thus, this type of storage is called block storage. The disk is divided into partitions and the filesystem chosen formats the partition into usable blocks of storage. Files are written to the filesystem in blocks. You can see the block size set for the filesystem by running the appropriate tool for your filesystem. For ext filesystems run:

dumpe2fs /dev/### | grep “Block Size”

More than likely you will not be able to run the command as a normal user so run it as root or use the sudo command. A common value might be 4096b or 4k blocks being reported. Therefore, files are written out in 4K blocks on the filesystem. A file that is 1 byte or 4095 bytes will use the same block as only one file can be written to a single block. So a 4097 byte file will consume 2 blocks of disk storage for a total of 4097 out of 9192 bytes resulting in 4095 bytes of the second block not being used. There is an easy way to demonstrate this using the dd command:

dd if=/dev/zero of=dutest bs=4096 count=1

This will create a single file called dutest with a block size count of 4096 bytes. Issue the du command on this file:

du dutest

And the result will be 4, or 4K that is. Repeat the dd command to create two more files of different size:

dd if=/dev/zero of=dutest2 bs=4097 count=1
dd if=/dev/zero of=dutest3 bs=7000 count=1

Run the du command again on these three files:

du -h dutest*

Both dutest2 and dutest3 will show 8 or 8.0K as being used, even though these files are actually different sizes. This is because du report usage in blocks, not actual file size. To change this behavior you can use the –aparent-size switch:

du –apparent-size -h dutest*

The values reported are now more closely related to how much space the data actually consumes:

du screenshot 3 --aparent-size

Be aware of these difference when comparing the output of du with the results of other applications like ls or wc.

The du command has options for handling hard and symbolic links. By default du will not count multiple instances of a hard link and it will not dereference, or follow, symbolic link(s). The latter option is -P (or –no-dereference), but as it is the default, you probably will not need to use this flag. If you want to include symbolic links use the -L or –dereference flag, du will then follow symbolic links to their original files and include them in the value.

The -l or –count-links option will count multiple instances of a hard link each time an instance is encountered. If you had three hard links to a file du would only count this as one for original file hard linked to. With the -l flag each hard link would be counted in the total, and in this case would include those 3 hard links in the usage.

Aside from disk space usage the du command can show time related information about a file or directory. This information includes mtime, atime, and ctime:

  • mtime = modification time, the last time a file was modified
  • atime = access time – the last time a file was accessed or read
  • ctime = the last time the inode was changed

Note that when mtime changes so does ctime. But ctime is based on inode and inode holds information about a file that is not the file name or data and includes time values, permissions, ownership, etc. Therefore the ctime can change without altering the mtime if you run a command like chmod on the file. To view mtime information use the –time flag:

du –time

You can change time to a different value like this:

du –time=word

Where word is:

  • atime (or access, use)
  • ctime (or status)

How these times are displayed can be altered using the –time-style= switch with one of these values:

  • full-iso – default YYYY-MM-DD HH:MM
  • long-iso – YYYY-MM-DD
  • +FORMAT where FORMAT is interpreted like the date command

The latter option takes the date format as you would specify in the date command as the value. For instance, to display just the Year and Hour the command would be:

du –time –time-style=+”%H %M”

Note that the values are encased in double quotes. The double quotes are need because of the space. If a space was not used the double quotes could be left off:

du –time –time-style=+%H-%M

Bibliography

If the video is not clear enough view it off the YouTube website and select size 2 or full screen.  Or download the video in Ogg Theora format:

Thank you very much!

This entry was posted in Uncategorized. Bookmark the permalink.

One Response to Episode 011: du – disk usage

  1. David says:

    I was just working my way through your radio programs and noticed that the “Mp3 file” link on this page goes to hpr’s .ogg file instead of the .mp3. Thanks for making all these great shows! :)

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>