Awk Knowledge Base
This is my ongoing knowledge base for the awk command in Linux.
General Knowledge
Return n-th column
i.e. first one:
awk '{print $n}' -f file.csv
cat file.csv | awk '{print $n}'
Return last column or n-th to last column
awk contains the constant NF which holds the number of columns, which allows printing the last or the second to last column:
cat file.csv | awk '{print $NF}' # Last column
cat file.csv | awk '{print $(NF-1)}' # Second to last column
This however assumes that the field separator is space without regard for field delimiters.
In case fields contain spaces they have probably field delimiters like quotation marks.
Return Field Delimited by Quotes
cat file.csv | awk -F '"' '{print $(NF-1)}' # Second to last column with " as field delimiter
Return Lines with Value in n-th Column
i.e. the string /wiki/ in seventh column:
awk '$7~/\/wiki\//'
i.e. the string 401 in tenth column:
awk '$10~401'
How to's
Calculate Sum of Column
Requirement
A sum is needed for a column, like i.e. sum of the size of a selection of files in a directory:
ls -la access.log.4?.gz
Output:
-rw-r----- 1 root adm 2173544 Jun 9 00:00 access.log.40.gz -rw-r----- 1 root adm 2141344 Jun 8 00:00 access.log.41.gz -rw-r----- 1 root adm 2017922 Jun 7 00:00 access.log.42.gz -rw-r----- 1 root adm 2083038 Jun 6 00:00 access.log.43.gz -rw-r----- 1 root adm 2023813 Jun 5 00:00 access.log.44.gz -rw-r----- 1 root adm 2227250 Jun 4 00:00 access.log.45.gz -rw-r----- 1 root adm 2382616 Jun 3 00:00 access.log.46.gz -rw-r----- 1 root adm 2121052 Jun 2 00:00 access.log.47.gz -rw-r----- 1 root adm 1981452 Jun 1 00:00 access.log.48.gz -rw-r----- 1 root adm 1873618 May 31 00:00 access.log.49.gz
Use awk to calculate the total size of the files above:
ls -la access.log.4?.gz | awk '{sum += $5} END {print sum}'
Hint: The $5 is a parameter representing the fifth column of the output of ls -la, which is the size column
sum += $5 creates a variable called sum and adds every value from the fifth column to it, without printing the intermediate result immediately. Instead ith prints the sum at the end.
