So, by pure chance I ended up looking at the reverse-complement benchmark of the Computer Language Shootout, and the SBCL (0.9.4) implementation of the benchmark was not doing very good. Back then, it was listed at almost 14 seconds, with the best implementation in C doing the same job in 0.69 seconds, and even python doing considerably better at around 3 seconds. Plus, the SBCL implementation used gobs of memory.
Time for some golfing!
A closer look at the benchmark reveals that we are actually measuring
I/O speed, and memory speed, as the benchmark is supposed to read in a
~25 MiB FASTA
(text) file via standard input, essentially reverse it chunk-wise, and
print it out formatted again. There is half an explanation in there
why the SBCL implementation fares so badly: depending on the locale it
would assume input data is
in UTF-8 format and try to decode it, plus
CHARACTERs are 32-bit wide. Also,
interpreters like python do relatively well, because they do not
actually spend much time in the interpreter, as
code undoubtedly show.
To cut a long story short: I provided a revcomp entry which beat every other implementation save the gcc implementation, and even there it came quite close. It is also quite ugly. With SBCL 0.9.4, the shootout benchmarking program measured RAM usage to be 35 MB, showing that my version consed quite less, for what it's worth. Then, several languages were upgraded (among them SBCL and Digital Mars D), and now my entry resides on the third place, behind the D and gcc entry. Oh well.
Lessons learnt (in no particular order):
- Lisp can be as fast as C! (News at 11)
- Unfortunately, the optimizations come at a price: complexity,
(our own buffer management,
etc). Approaching the speed of C seems to force use to more or
less write C in Lisp. Fortunately, most of the time, we don't
Note however, that contrary to other languages, we did not have to drop down to use FFI, this is still all standard Common Lisp functionality, and unless we lie to the compiler, we are still safe from buffer overruns and other nastiness.
- When optimizing an I/O bound application we may consider using
WRITE-SEQUENCE) with element type
(UNSIGNED-BYTE 8), thus bypassing most of the stream functionality.
- Listen to your compiler (notes), it has important things to tell! I just wished the compiler would also reveal why it determined that, e.g., code is not reachable... In all cases this was due to errors elsewhere on my side, and SBCL is quite smart at figuring them out, it just does not tell me.
- I could not get decent performance by growing the input buffer like the C version, realloc-style. If an array is not adjustable, it is presumably copied when using ADJUST-ARRAY, and if we create it adjustable, it is not simple any more, and the compiler loudly complains that it cannot do some optimizations. This complicates matters slightly. I use the moral equivalent of Erlang iolists instead: a list of vectors.
(DECLARE (OPTIMIZE ...))annotations should be used sparingly, and not shotgun-style, as seems to be the default shootout compilation policy.
- A slower (roughly by a factor of 4, perhaps because of the
CHARACTERs?) but saner revcomp version is probably preferable in all but exceptional situations. It is a lot shorter, too.
- Optimizations like the above are highly implementation-dependent. Different implementations might exhibit different behavior, or might even have better ways to reach the same performance (e.g., different stream implementations).
Out of all the benchmark entries so far, I liked the python entry quite a bit for it's clarity and conciseness. However, most of it is probably just calling out to C.
Børge Svingen kindly pointed me to his paper
Lisp is Faster than C"
He uses dynamic compilation to
speed up his Lisp computations. In Common Lisp, this is as easy as calling
Of course, with some more pain we could do something similar in C by generating C code, compiling it (e.g., with tcc), and loading it dynamically. Or use GNU Lightning for JIT compilation (even more pain).