AWKish PHP

I love AWK. It is a wonderful tool for data processing on Unix systems. I truly love it. There is certainly no better tool to process and aggregate log files. I remember back when I introduced AWK to my students, there was always an immediately appreciative murmur in the round when the first AWK scripts showed their power.

And it's so stable. Perhaps even more stable than all the new-fashioned stuff nowadays. I have one AWK monster script running since 1992 unchanged in a productive environment, without ever encountering any problems. No need to change anything after countless system upgrades. A dream within a dream. In the same time PHP would have released in ten new major releases and certainly wouldn't be no longer compatible with itself.

But...

In time of HTML entities, URL encoding and XML you quickly run unto the limits of AWK and it's becoming a very hard job to use AWK in this new context. Ever tried to URL decode or encode some string within AWK? Or process an XML file?

However...

Since PHP 5 the command line interface of PHP offers the wonderful options -R with its siblings -B and -E. With this three fellows, you can run PHP in a quasi AWK mode: -R for a line by line processing of the input stream, the option -B as an equivalent for AWK's BEGIN and -E for AWK's END.

A small example

An example as old as the world: counting lines with AWK.

# time awk 'BEGIN {s=0} {s++} END {print s}' access_log
327970
0.148u 0.040s 0:00.17

The same in awkish PHP:

# time php -B '$s=0;' -R '$s++;' -E 'print "$s\\n";' < access_log
327970
1.064u 0.044s 0:01.20

Yes, of course one would use wc in the real world:

# time wc -l access_log
327970 access_log
0.036u 0.048s 0:00.07

The time command in this examples shows the big drawback of PHP: In comparison to AWK it's really slow. But this is understandable, because PHP is far more complex than AWK and PHP was made for web programming and not as a sysadmin's power tool. Always use a tool for the job it was designed to do. On the other hand PHP has all the functions you absolutely need in today's work environments. (Please don't mention XMLgawk or WebAWK now.)

But to be honest: PHP is only a small blink in the history of the Internet. AWK is the A and O, the beginning and the end, the first and the last. Aho, Weinberger and Kernighan. You're my trinity.

Comments:

Post a Comment:
  • HTML Syntax: NOT allowed
About

Kai 'Oswald' Seidler writes about his life as co-founder of Apache Friends, creator of XAMPP, and technology evangelist for web tier products at Sun Microsystems.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today