Michael Weber: Random Bits and Pieces

Lisp Logo (by Conrad Barsky)

So, by pure chance I ended up looking at the reverse-complement benchmark of the Computer Language Shootout, and the SBCL (0.9.4) implementation of the benchmark was not doing very good. Back then, it was listed at almost 14 seconds, with the best implementation in C doing the same job in 0.69 seconds, and even python doing considerably better at around 3 seconds. Plus, the SBCL implementation used gobs of memory.

Time for some golfing!

A closer look at the benchmark reveals that we are actually measuring I/O speed, and memory speed, as the benchmark is supposed to read in a ~25 MiB FASTA (text) file via standard input, essentially reverse it chunk-wise, and print it out formatted again. There is half an explanation in there why the SBCL implementation fares so badly: depending on the locale it would assume input data is in UTF-8 format and try to decode it, plus internally, SBCL CHARACTERs are 32-bit wide. Also, interpreters like python do relatively well, because they do not actually spend much time in the interpreter, as the python code and perl code undoubtedly show.

To cut a long story short: I provided a revcomp entry which beat every other implementation save the gcc implementation, and even there it came quite close. It is also quite ugly. With SBCL 0.9.4, the shootout benchmarking program measured RAM usage to be 35 MB, showing that my version consed quite less, for what it's worth. Then, several languages were upgraded (among them SBCL and Digital Mars D), and now my entry resides on the third place, behind the D and gcc entry. Oh well.

Lessons learnt (in no particular order):

  • Lisp can be as fast as C! (News at 11)
  • Unfortunately, the optimizations come at a price: complexity, (our own buffer management, no READ-LINE, etc). Approaching the speed of C seems to force use to more or less write C in Lisp. Fortunately, most of the time, we don't have to.
    Note however, that contrary to other languages, we did not have to drop down to use FFI, this is still all standard Common Lisp functionality, and unless we lie to the compiler, we are still safe from buffer overruns and other nastiness.
  • When optimizing an I/O bound application we may consider using block operations (READ-SEQUENCE, WRITE-SEQUENCE) with element type (UNSIGNED-BYTE 8), thus bypassing most of the stream functionality.
  • Listen to your compiler (notes), it has important things to tell! I just wished the compiler would also reveal why it determined that, e.g., code is not reachable... In all cases this was due to errors elsewhere on my side, and SBCL is quite smart at figuring them out, it just does not tell me.
  • I could not get decent performance by growing the input buffer like the C version, realloc-style. If an array is not adjustable, it is presumably copied when using ADJUST-ARRAY, and if we create it adjustable, it is not simple any more, and the compiler loudly complains that it cannot do some optimizations. This complicates matters slightly. I use the moral equivalent of Erlang iolists instead: a list of vectors.
  • (DECLARE (OPTIMIZE ...)) annotations should be used sparingly, and not shotgun-style, as seems to be the default shootout compilation policy.
  • A slower (roughly by a factor of 4, perhaps because of the 32-bit wide CHARACTERs?) but saner revcomp version is probably preferable in all but exceptional situations. It is a lot shorter, too.
  • Optimizations like the above are highly implementation-dependent. Different implementations might exhibit different behavior, or might even have better ways to reach the same performance (e.g., different stream implementations).

Out of all the benchmark entries so far, I liked the python entry quite a bit for it's clarity and conciseness. However, most of it is probably just calling out to C.

UPDATE 2006-08-18: When Lisp is Faster than C

Børge Svingen kindly pointed me to his paper "When Lisp is Faster than C" (ACM tax). He uses dynamic compilation to speed up his Lisp computations. In Common Lisp, this is as easy as calling COMPILE at runtime.

Of course, with some more pain we could do something similar in C by generating C code, compiling it (e.g., with tcc), and loading it dynamically. Or use GNU Lightning for JIT compilation (even more pain).

The strategy of generating specialized C code is used quite often, for example in CADP and SPIN.

JDATE

David Wong's JDATE could have been written by localroger's little brother. Maybe he is. Quote from the site:

...like being run over by a truck made of horror, hauling four tons of stupid...

So true... As Volker said, you just cannot stop reading.

New Paper

Together with Moritz Hammer, I wrote a paper on how to explore very large state spaces: To Store or Not to Store Reloaded: Reclaiming Memory on Demand (accepted for publication to FMICS 2006).

Abstract

Behrmann et al. posed the question whether "To Store or Not To Store" states during reachability analysis, in order to counter the effects of the well-known state space explosion problem in explicit-state model checking. Their answer was to store not all but only some strategical states. They pay in run-time if the answer too often is "Not To Store". We propose a different strategy to adaptively trade time for space: "To Store" as many states as memory limits permit. If memory runs full, we gradually swap states out to secondary storage. We are careful to minimize revisits, and I/O overhead, and also stay sound, i.e. on termination it is guaranteed that the full state space has been explored. It is also available for counterexample reconstruction. In our experiments we tackled state spaces of industrial-sized models with more than 109 explicit states with still modest storage requirements.

We actually have a practical application for this: automatically finding subtle bugs in Embedded Systems software.

Some pictures have been rotting way too long on my cell phone. Unfortunately, I forgot almost all details about them. Here goes.

PainStation

An electrifying game console, literally. Volker and I played it at an exhibition in Aachen earlier this year. The whipping was most unpleasant. Remaining thumbs up for most violent game of Pong ever played.

PainStation

PainStation Warning

Daithi Rua

Cool Irish music, that time in the John Mullins Irish pub in Maastricht; highly recommended! We had some time to chat with David and company before the gig, and watched him eat enormous amounts of fries, too.

Daithi Rua

Art and Operating Systems

Is it just me or do others also see a certain similarity to Plan9's Glenda in the following picture? Surely this cannot be coincidence! Painting: Plan 9 from Outer Space

I forgot in which exhibition I saw it (August-Macke-Haus, Bonn?). I do remember, however, the blank stares from my company when I tried to explain why I found this picture interesting. Oh well...

X Logo

It probably happened to every (X11) Emacs user already. We routinely use backspace to kill a character, C-backspace to kill a word, C-x backspace to kill a sentence, etc. And inevitably, we press C-M-backspace trying to kill an s-expression, just to find ourselves wondering why the screen suddenly turns black for a moment before the login screen appears again. Yup, we zapped the X server, because that is what C-M-backspace does.

Yesterday, it was Riastradh's turn to press the wrong key combination. An easy solution to prevent high blood pressure and lost work next time is to put the following snippet into the X configuration file (e.g., /etc/X11/xorg.conf):


Section "ServerFlags"
	# When this option is enabled, C-M-backspace
	# has no special meaning and is passed to clients.
        Option      "DontZap" "True"
EndSection

However, this requires root priviledges, which we might not have on every machine. There is another solution, which does not require any special rights. A short look with xmodmap shows what the Backspace key is wired to:


% xmodmap -pke|grep -i backspace
keycode  22 = BackSpace Terminate_Server

We may get rid of it with:


% xmodmap -e 'keysym BackSpace = BackSpace BackSpace'

Reenabling the zap behavior is possible as well:


% xmodmap -e 'keysym BackSpace = BackSpace Terminate_Server'

Well, it would be too easy if this would all work now, wouldn't it? Unfortunately, newer versions of X, in particular X.org version X11R7 (Debian xserver-xorg 1:7.0.22), seem to kill the server regardless whether or not Terminate_Server is assigned to a key, unless mentioned DontZap option is added to the global configuration file. Back to Square One.

Please, won't somebody think about the baby seals Emacs users?!