Episode 003 – cut

The cut command, as the man page states, “removes sections from each line of a file.”  The cut command can also be used on a stream and it can do more than just remove section.  If a file is not specified or “-” is used  the cut command takes input from standard in.  The cut command can be used to extract sections from a file or stream based upon a specific criteria.  An example of this would be cutting specific fields from a csv (comma separated values) file.  For instance, cut can be used to extract the name and email address from a csv file with the following content:

id, date, username, first name, last name, email address, phone, fax
1,2012-01-01,franklinf, Ford, Franklin, ff@gmail.com, 7575551212, 7775551234
2,2012-02-01,levona, Allan, Levon, allanl@tllts.org, 3177771212,
3,2012-02-17,mannyt,  Trish, Manny, tmanny@hpr.org,7275551212,8885551236

The syntax for cut would be:

cut -d”,” -f4,5,6 users.csv

The result would be displayed on standard out:

first name, last name, email address
Ford, Franklin, ff@gmail.ccom
Allan, Levon, allanl@tllts.org
Trish, Manny, tmanny@hpr.org

The -d option specifies the delimiter which is defaults to a TAB.   In the example above the cut command will “cut” the line at each “,” instead of a TAB.  The -f option indicates which fields to select, in this case fields 4, 5, and 6 which correspond to “first name,” “last name,” and “email address.”

The cut command can operate on fields, characters or bytes and must include one and only one of these options.

The field option operates on the cuts defined by the delimiter (-d), which is TAB by default.  The -d option can only be used with the field option.  Attempting to use the -d option with the character (-c) or bytes (-b) options will result in an error.  The -f value can be a command separated list or a range separated by a “-”:

cut -d”,” -f 1,2,3,4
cut -d”,” -f 1-4
cut -f 1-4,7,9
cut -d”,” -f -7
cut -d”,” -f 7-

Specifying a rang “-#” will display the first field to the seventh field.  The last entry will display fields 7 and the remaining fields until the end of the line.

The -f operator will also print lines that do not contain the delimiter character.  For instance in the example above, if a line was added to the end of the file producing:

id, date, username, first name, last name, email address, phone, fax
1,2012-01-01,franklinf, Ford, Franklin, ff@gmail.com, 7575551212, 7775551234
2,2012-02-01,levona, Allan, Levon, allanl@tllts.org, 3177771212,
3,2012-02-17,mannyt,  Trish, Manny, tmanny@hpr.org,7275551212,8885551236
this is a line without the delimiter

Executing:

cut -d”,” -f4,5,6 users.csv

Would produce the following output:

first name, last name, email address
Ford, Franklin, ff@gmail.ccom
Allan, Levon, allanl@tllts.org
Trish, Manny, tmanny@hpr.org
this is a line without the delimiter

To prevent the -f option from printing lines that do not contain the delimiter use the –only-delimited, or -s, option.

cut -d”,” -f4,5,6 -s users.csv

The other two “field” options do not work with delimiter: -c and -b.  The –character or -c option works on columns.  The man and info pages refer to the -c option as working on characters but many other references refer columns.  Technically cut references characters but considering the output of cut works on “fields” or “columns” of data one can think of each character in a line as a column.  Thus, the delimiter in this case is each individual character.  The values passed to -c must be list of digits separated by commas or a range:

echo “here is a line of text” | cut -c “1,2,3,4″
echo “here is a line of text” |cut -c “1-4″

Both examples produce the same output:

here

Where as:

echo “here is a line of text” |cut -c “6-”

Would produce:

is a line of text

Recall that specifying a value “#-” outputs from # to the end of the line.

The cut command can also work with bytes using the -b option and specifying a byte, or range of bytes like you would a field or character:

echo “here is a line of text” |cut -b “1-6″

Would produce the following output:

here i

Note that the result is the same as specifying cut -c “1-6″.  In most cases you will be working with a single byte characters set, and more than likely you will never need to worry about multi-byte characters.  Thus, each character is a single byte.

Cut takes allow for a few more flags to control the output.  When discussing the -f, or field, option the -s, –only-delimited, flag was mentioned.  The -s flag suppresses the output of lines not containing the delimiter.

Cut will produce the complement of the standard output when –complement is used.  That is, it will output the opposite of what is normally generated from the cut command:

echo “here is a line of text” |cut -b “9-”

Produces everything from the ninth byte to the end of the line:

a line of text

Whereas:

echo “here is a line of text” | cut -b “9-” –complement

Produces the complement which is bytes 0  to 8:

here is

Note that the space between “is” and “a” is included in this output even though it is not easy to show in the example.

Finally, the flag –output-delimiter=STRING will allow you to change the output delimiter to something else:

echo “1:2:3:4:5:6:7″ | cut -d”:” -f “2-5″

Will produce the following output:

2:3:4:5

But the “:” output delimiter can be altered with –output-delimiter:

echo “1:2:3:4:5:6:7″ |cut -d”:” -f “2-5″  /
–output-delimiter=”,”

Producing:

2,3,4,5

 

Bibliography:

If the video is not clear enough view it off the YouTube website and select size 2 or full screen.  Or download the video in Ogg Theora format:

Thank you very much

 

This entry was posted in Uncategorized. Bookmark the permalink.

One Response to Episode 003 – cut

  1. ron says:

    professionally well done

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>