Awk Knowledge Base

Aus Matts Wiki

This is my ongoing knowledge base for the awk command in Linux.

General Knowledge

Return n-th column

i.e. first one:

awk '{print $n}' -f file.csv
cat file.csv | awk '{print $n}'

Return last column or n-th to last column

awk contains the constant NF which holds the number of columns, which allows printing the last or the second to last column:

cat file.csv | awk '{print $NF}'           # Last column
cat file.csv | awk '{print $(NF-1)}'       # Second to last column

This however assumes that the field separator is space without regard for field delimiters.

In case fields contain spaces they have probably field delimiters like quotation marks.

Return Field Delimited by Quotes

cat file.csv | awk -F '"' '{print $(NF-1)}'       # Second to last column with " as field delimiter

Return Lines with Value in n-th Column

i.e. the string /wiki/ in seventh column:

awk '$7~/\/wiki\//'

i.e. the string 401 in tenth column:

awk '$10~401'

How to's

Calculate Sum of Column

Requirement

A sum is needed for a column, like i.e. sum of the size of a selection of files in a directory:

ls -la access.log.4?.gz

Output:

-rw-r----- 1 root adm 2173544 Jun  9 00:00 access.log.40.gz
-rw-r----- 1 root adm 2141344 Jun  8 00:00 access.log.41.gz
-rw-r----- 1 root adm 2017922 Jun  7 00:00 access.log.42.gz
-rw-r----- 1 root adm 2083038 Jun  6 00:00 access.log.43.gz
-rw-r----- 1 root adm 2023813 Jun  5 00:00 access.log.44.gz
-rw-r----- 1 root adm 2227250 Jun  4 00:00 access.log.45.gz
-rw-r----- 1 root adm 2382616 Jun  3 00:00 access.log.46.gz
-rw-r----- 1 root adm 2121052 Jun  2 00:00 access.log.47.gz
-rw-r----- 1 root adm 1981452 Jun  1 00:00 access.log.48.gz
-rw-r----- 1 root adm 1873618 May 31 00:00 access.log.49.gz

Use awk to calculate the total size of the files above:

ls -la access.log.4?.gz | awk '{sum += $5} END {print sum}'

Hint: The $5 is a parameter representing the fifth column of the output of ls -la, which is the size column

sum += $5 creates a variable called sum and adds every value from the fifth column to it, without printing the intermediate result immediately. Instead ith prints the sum at the end.