Tutorial Seven

Data Processing and Arithmetic

You will now learn how to combine the powerful formatting commands you have learned with arithmetical and string functions.

Suppose we have a data file containing employee's rates of pay per hour and their weekly hours:

Brooks 10 35
Everest 8 40
Hatcher 12 20
Phillips 8 30
Wilcox 12 40

We can use awk to calculate each employee's gross weekly wage and the tax they are required to pay, producing a report like this:

NAME	   RATE	     HOURS    GROSS	TAX (25%)

Brooks	   10.00     35	      350.00	87.50
Everest	    8.00     40	      320.00	80.00
Hatcher	   12.00     20	      240.00	60.00
Phillips    8.00     30	      240.00	60.00
Wilcox	   12.00     40	      480.00	120.00

TOTALS	   	     165      1630.00	407.50
To achieve this we need to perform the following tasks: Lets look at each of these in turn...

(1) Displaying the report header

The report header is displayed by simply printing the column names. This should occur at the beginning of the program, before awk starts to read through all the records.

Actions which have to be executed before the input data is read are associated with the awk pattern called BEGIN. So to display our header information we will need code similar to :-

BEGIN {
 print "NAME	RATE	HOURS	TAX" }

(2) Calculating and displaying GROSS pay and TAX

To calculate GROSS pay we need to multiply the number of hours an employee works by their hourly rate. The TAX figure is twenty-five percent of the GROSS pay.

We can implement these calculations by the use of simple arithmetic on a print statement like that shown below:

Notice that you can use arithmetical operators (+ - * /) on existing columns to produce new columns.

Of course the above statement does not format the results quite as we would wish and its for this reason that we will make use of the printf command instead. We will need to define columns 2, 4 and 5 as floating point numbers with two decimal places. The following command will do this:

The above printf can be broken down into the following components...

(3) Calculating the totals

To calculate the total number of hours worked by the employees, the total GROSS pay and the total TAXABLE pay we use awk variables to add up column values. The variable is used to accumulate the total and then hold the final product.

Code to add up all the hours worked by employee's would look like this:

Instead of incrementing the variable in the faishon shown above we could use the C programming shorthand, rendering hours = hours + $3 to hours += $3.

We repeat the above for each of the columns we wish to total.

(4) Displaying the totals

Above we used the BEGIN pattern to display the report header. Here we will use the END pattern to display the totals. This pattern causes subsequent actions to be performed after awk has scanned all the records.

You can see the finished awk program below:

After studying the above you can try the programming exercise for tutorial seven.

[Help] [Provide some feedback] [Go to Previous Page]