Recently, I made good use of Michael Parker's CLAWK, a Lisp-embedded variant of AWK.
So far, I had used the real AWK, along with perl and other parts of
the Unix toolbox to analyze data from experiments, and munge it into
HTML and LaTeX tables. However, this time I expected the
experiments to be carried out in
a run–tweak–rerun
fashion with several
iterations, and I did not want to reparse several hundred megabytes
each time, just for a change in table layout, or for adding another
analysis. (Also, I had not much time for lispy things recently, so
this was a good way to sneak some Lisp back into my current C++
hell...)
Enter CLAWK. The following function parses my benchmark data into Lisp objects:
(defvar *foo-table* (make-hash-table :test 'equal))
(defawk parse-foo-benchmark (&aux model)
(#/^memtime/
(when model
(emit-model model *foo-table*))
(let ((path (parse-namestring $6)))
(setf model (make-instance 'foo-model-record :filename path))))
(#/^Instantiator: Explored/
(setf (get-states model) $#3
(get-transitions model) $#6
(get-bfs-levels model) $#9
(completed? model) t))
((string= $6 "elapsed")
(setf (get-generation-time model) $#5))
((string= $13 "RSS")
(setf (get-generation-memory model) (parse-integer $15 :junk-allowed t)))
(END
(when model
(emit-model model *foo-table*))
*foo-table*))
Parsing the bulk of my data with the above function takes about
80 seconds, probably slower than it would be with AWK.
However, once parsed I have all the data available at my fingertips
inside the Lisp image. I can prod it with
the SLIME
inspector, identify outliers, and play around with it in the REPL
until I am satisfied. In addition, I can selectively rerun parts of
the experiments, and parse just the newly produced output to update
the in-memory representation, again in no time flat. Beats the Unix
everything is a byte stream
way every day of the week.
Rendering the results as HTML is a breeze with CL-WHO. The same holds for reordering columns, marking interesting entries programmatically, cross-referencing with other entries (or earlier versions of the data) and refining the analyses as much as I wish, with instant feedback.
Unfortunately, CLAWK has acquired some bitrot since its release in 2002 (?). I needed to tweak it slightly to make it compile (only tested with SBCL). When I tried to contact Michael Parker, his email bounced, so I decided to put up a patched version locally until the changes are folded back into his distribution. In addition, CLAWK is now ASDF-installable (along with its dependency, REGEX), courtesy of Redshank's ASDF Defsystem skeleton.
If time permits, I will fix up the code some more to get rid of the warnings, and perhaps allow CL-PPCRE as alternative regular expression engine. However, patches from the open-source fairies are very welcome, too.
