Episode 022 - sort

The sort command does just that, it sorts input.  Input can be a list of files, standard in, or files with standard in. The first example presents this simple file, shopping.txt,  containing a list of items:

chicken
fish
sour cream
bread crumbs
milk
eggs
bread
sinkers
fishing hooks

Issuing the sort command on this file:

sort shopping.txt

Would present the following output:

bread
bread crumbs
chicken
eggs
fish
fishing hooks
milk
sinkers
sour cream

Sort presents the items in alphanumeric order and by case.  Note that symbols have the highest hierarchy.  So if passed this list to sort:

flounder
2lb sinker
5 bobbers
Strike Caster Reel
swivels
Three minnows
#zee banjo minnow

The output would be:

#zee bangjo minnow
2lb sinker
5 bobbers
Strike Caster Reel
Three minnows
flounder
swivels

Notice the output starts with symbols, then numbers, and finally moves to the alphabet ranking upper case letters first.

There are a number of options to control how sort behaves.  The -d or --dictionary-order option sorts the output considering only blank spaces and alphanumeric characters.  It ignores symbols.

1000
#bannana
#apple
zinger
02
20

A regular sort on this list produces the following output (note there is a space before the “z” in “zinger”):

 zinger
#apple
#bannana
02
1000
20

But executed with the -d option produces this output:

 zinger
02
1000
20
#apple
#bannana

Sort is not ranking the symbols first.   The -b or ignore --ignore-leading-blanks produces a sort ignoring leading blanks ordering the list as with ” zinger” at the bottom:

#apple
#bannana
02
1000
20
zinger

The -f or  --ignore-case sorts a list by alphanumeric sort but as it states ignores the case.  A regular sort on the following list:

bannana
Apple
Carrot
orange
Grape

Produces the following sort:

Apple
Carrot
Grape
bannana
orange

But with the -f option the list is sorted in this manner:

Apple
bannana
Carrot
Grape
orange

The entry “bannana” occurs after “Apple” as the case of the items is ignored.

The -r, or --reverse, option reverses the sort order. So this list:

Apple
Carrot
Grape
bannana
orange

With -r becomes:

orange
banan
Grape
Carrot
Apple

Sort has a month sorting option: -M or --month-sort that will sort a list of months in their proper order:

April
Jun
May
January
Dec
february

Issuing sort -M produces the following output:

January
february
April
May
Jun
Dec

There are a few other options to sort that determine the output:

  • -h or --human-numeric-sort
  • -g or --general-numeric-sort
  • -n or --numeric-sort
  • -i or --ignore-nonprinting
  • -V or --version-sort

Human numeric sort first determines whether there is a number sign - postivie, zero, or negative and then looks whether there is a suffix. Suffixes can be one of:

  • K or k
  • M
  • G
  • T
  • P
  • E
  • Z
  • Y

Note that case sensitivity is important and the suffix is sorted before the numeric value:

1G
1042M
15
-32P

The output of sort -h on this list would be:

-32P
15
1042M
1G

Even though the value of 1042M would be greater than 1G.

General number sort, -g or --general-number-sort, follows a different rule set from standard numeric sort. It converts each line to a long double-precision floating point number and treats lines that do not start with  numbers as equal..

15
+12
zeta
-32
5.8880
0
alpha

A regular numeric sort, sort -n, produces this list:

-32
+12
0
alpha
zeta
5.8880
15

While a general numeric sort, sort -g, produces this list:

alpha
zeta
-32
0
5.8880
+12
15

There is a random option to sort using the -R, or --random-sort:

sort -R some_file

This does exactly what you think, randomizes the output.

Version sorting acts a bit differently than the previously mentioned sorts. Version sorts match on indices and version numbers and not just on by examining the first character. For instance, in a directory listing of these files:

myapp-012.tar.gz
myapp-012b.tar.gz
myapp-013.tar.gz
myapp-0013b.tar.gz

A normal sorting would product the following list:

myapp-0013b.tar.gz
myapp-012.tar.gz
myapp-012b.tar.gz
myapp-013.tar.gz

Where as sort -V would produce:

myapp-012.tar.gz
myapp-012b.tar.gz
myapp-013.tar.gz
myapp-0013b.tar.gz

There is one more basic option to sort and that is to do a reverse sort with the -r, or --reverse. This option can be combined with any of the other options listed above to augment the sort to be reversed.

These are the basic options to sort. The last note about sort is that the sort type can be specified using the --sort=WORD switch where the value of word would be one of the following:

  • general-numeric
  • human-numeric
  • month
  • numeric
  • random
  • version

Sort is a handy utility for managing lists. Combined with other commands like uniq, cut, and grep one can produce an output of pertinent data in format that can be utilized to process data quickly.

Bibliography

  • man sort
  • info sort


If the video is not clear enough view it off the YouTube website and select size 2 or full screen.  Or download the video in Ogg Theora format:

Thank you very much!

This entry was posted in Uncategorized. Bookmark the permalink.

2 Responses to Episode 022 - sort

  1. Elex says:

    Really Great Post…
    Please add sort -u for unique sort and sort -r for reverse sort.
    You have defined sort -R but it is not what I was searching..

    • dannSWashko says:

      Ok I squeezed reverse in there but I have to spend a bit more time to get unique. Thanks for catching this!

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>