Recently, I made good use
of Michael
Parker's CLAWK, a
Lisp-embedded variant of AWK.
So far, I had used the real AWK, along with perl and other parts of
the Unix toolbox to analyze data from experiments, and munge it into
HTML and LaTeX tables. However, this time I expected the
experiments to be carried out in
a run–tweak–rerun
fashion with several
iterations, and I did not want to reparse several hundred megabytes
each time, just for a change in table layout, or for adding another
analysis. (Also, I had not much time for lispy things recently, so
this was a good way to sneak some Lisp back into my current C++
hell...)
Enter CLAWK. The following function parses my benchmark data into
Lisp objects:
(defvar *foo-table* (make-hash-table :test 'equal))
(defawk parse-foo-benchmark (&aux model)
(#/^memtime/
(when model
(emit-model model *foo-table*))
(let ((path (parse-namestring $6)))
(setf model (make-instance 'foo-model-record :filename path))))
(#/^Instantiator: Explored/
(setf (get-states model) $#3
(get-transitions model) $#6
(get-bfs-levels model) $#9
(completed? model) t))
((string= $6 "elapsed")
(setf (get-generation-time model) $#5))
((string= $13 "RSS")
(setf (get-generation-memory model) (parse-integer $15 :junk-allowed t)))
(END
(when model
(emit-model model *foo-table*))
*foo-table*))
Parsing the bulk of my data with the above function takes about
80 seconds, probably slower than it would be with AWK.
However, once parsed I have all the data available at my fingertips
inside the Lisp image. I can prod it with
the SLIME
inspector, identify outliers, and play around with it in the REPL
until I am satisfied. In addition, I can selectively rerun parts of
the experiments, and parse just the newly produced output to update
the in-memory representation, again in no time flat. Beats the Unix
everything is a byte stream
way every day of the week.
Rendering the results as HTML is a breeze
with CL-WHO. The same holds
for reordering columns, marking interesting entries
programmatically, cross-referencing with other entries (or earlier
versions of the data) and refining the analyses as much as I wish,
with instant feedback.
Unfortunately, CLAWK has acquired some bitrot since its release in
2002 (?). I needed to tweak it slightly to make it compile
(only tested with SBCL). When I
tried to contact Michael Parker, his email bounced, so I decided to
put up a patched version locally until the changes are folded back
into his distribution. In
addition, CLAWK is now
ASDF-installable (along with its
dependency, REGEX),
courtesy of
Redshank's ASDF Defsystem
skeleton.
If time permits, I will fix up the code some more to get rid of the
warnings, and perhaps
allow CL-PPCRE as
alternative regular expression engine. However, patches from the
open-source fairies are very welcome, too.