Michael Weber: Random Bits and Pieces

...Please don't assume Lisp is only useful for Animation and Graphics, AI, Bioinformatics, B2B and E-Commerce, Data Mining, EDA/Semiconductor applications, Expert Systems, Finance, Intelligent Agents, Knowledge Management, Mechanical CAD, Modeling and Simulation, Natural Language, Optimization, Research, Risk Analysis, Scheduling, Telecom, and Web Authoring just because these are the only things they happened to list.

Kent M. Pitman

Lisp Logo (by Conrad Barsky)

Recently, I made good use of Michael Parker's CLAWK, a Lisp-embedded variant of AWK.

So far, I had used the real AWK, along with perl and other parts of the Unix toolbox to analyze data from experiments, and munge it into HTML and LaTeX tables. However, this time I expected the experiments to be carried out in a run–tweak–rerun fashion with several iterations, and I did not want to reparse several hundred megabytes each time, just for a change in table layout, or for adding another analysis. (Also, I had not much time for lispy things recently, so this was a good way to sneak some Lisp back into my current C++ hell...)

Enter CLAWK. The following function parses my benchmark data into Lisp objects:

(defvar *foo-table* (make-hash-table :test 'equal))

(defawk parse-foo-benchmark (&aux model)
   (when model
     (emit-model model *foo-table*))
   (let ((path (parse-namestring $6)))
     (setf model (make-instance 'foo-model-record :filename path))))
  (#/^Instantiator: Explored/
   (setf (get-states model) $#3
         (get-transitions model) $#6
         (get-bfs-levels model) $#9
         (completed? model) t))
  ((string= $6 "elapsed")
   (setf (get-generation-time model) $#5))
  ((string= $13 "RSS")
   (setf (get-generation-memory model) (parse-integer $15 :junk-allowed t)))
   (when model
     (emit-model model *foo-table*))

Parsing the bulk of my data with the above function takes about 80 seconds, probably slower than it would be with AWK. However, once parsed I have all the data available at my fingertips inside the Lisp image. I can prod it with the SLIME inspector, identify outliers, and play around with it in the REPL until I am satisfied. In addition, I can selectively rerun parts of the experiments, and parse just the newly produced output to update the in-memory representation, again in no time flat. Beats the Unix everything is a byte stream way every day of the week.

Rendering the results as HTML is a breeze with CL-WHO. The same holds for reordering columns, marking interesting entries programmatically, cross-referencing with other entries (or earlier versions of the data) and refining the analyses as much as I wish, with instant feedback.

Unfortunately, CLAWK has acquired some bitrot since its release in 2002 (?). I needed to tweak it slightly to make it compile (only tested with SBCL). When I tried to contact Michael Parker, his email bounced, so I decided to put up a patched version locally until the changes are folded back into his distribution. In addition, CLAWK is now ASDF-installable (along with its dependency, REGEX), courtesy of Redshank's ASDF Defsystem skeleton.

If time permits, I will fix up the code some more to get rid of the warnings, and perhaps allow CL-PPCRE as alternative regular expression engine. However, patches from the open-source fairies are very welcome, too.