Two Wrongs

Grep, sed and awk – The Right Tool For The Job

Grep, sed and awk – The Right Tool For The Job

When writing the post on the awk state machine parsing pattern I realised there’s something else I want to share.

The grep, sed and awk tools are all three classics when it comes to text processing in a Unix environment. I’m not going to try to convince you to learn them, I’m just going to say things become classics for a reason.

But what if you have a passing familiarity with them, you just don’t know which one to use for your task at hand?

Definitions

First, let’s clear up some terminology. All of these tools do streaming processing – so what does that mean?

Stream
A stream is a (potentially infinite) sequence of records. separated by some string called the rs – record separator. The record separator is often a newline character, making the records lines.
Record
A record is a collection of fields, separated by some string called the fs – field separator. The field separator is often whitespace.
Field
Fields are just text strings.

Power Scale

All three tools process streams by reading one record at a time. They also fall on a strictly increasing power spectrum.

  1. The most basic of the three tools is grep, which has the ability to select and reject records based on a pattern. That’s all it does – but sometimes that’s all you need!
  2. If you take grep and add the ability to mutate each record individually, you get sed. Since this power spectrum is strictly increasing, you may expect to be able to emulate grep with sed. And of course, you can: grep pattern is equivalent to sed -n /pattern/p.
  3. If you then take sed and think about what it would be like if it could remember things between records, i.e., earlier records have the ability to affect how later records are processed – then you get awk! Again, emulating grep is simple: awk '/pattern/ { print $0; }'

So, in summary: grep gives you plain select/reject; sed gives you select/reject coupled with additional processing of each record individually; awk gives you both of the above and also the ability to save values that are carried over to the next record.

So when I need to choose between the three, I think of it that way, and then I select the least powerful one that still lets me accomplish my task.


Reader Vasudev (who blogs at https://jugad2.blogspot.com/) was kind enough to email me this observation.

Re. this:

awk ’pattern { print $0; }’

You probably just forgot it, but the default action of awk, when only a pattern is given (and no action), is to print $0. So the braces and what is in it, are not needed. In fact, even if you used the print statement, IIRC (need to check, manual not handy), just print will suffice, because for awk, the default thing to print is $0.

Thank you, of course you’re right!