Michael Weber: Random Bits and Pieces

Finder Icon

MacPorts is a package repository that aims to be an easy-to-use system for compiling, installing, and upgrading [...] open-source software on the MacOS X operating system. Unfortunately, it is not very good. In fact, I think it sucks quite a bit, in a way that cannot be repaired easily without some fundamental changes. At the end of this story, I will propose to let MacPorts die. Here's why.

First Things First: Why a Package System?

Here's the scenario: In order to accomplish some given task Jane User wants to use a program which is not yet installed. The sole point of a package system is to get Jane to her goal (running and using said program) as fast and efficient as possible. The fact that she has to install the program's package before using it is already overhead. Minimizing this overhead will help Jane focus on her task, and be more efficient.

With MacPorts, chances are high that instead, Jane has to handhold the installation, un- or reinstall packages, jump through hoops, and generally wait a long time before she can use the program. Even worse, she quite likely has to page in knowledge about compilers, build systems, OS idiosyncrasies, etc., if something goes wrong. Nothing of this is related to her original task. All of it slows Jane down.

In the remainder of this longish rant, I will try to highlight why MacPorts fails me. I will use a recent GNUCash installation as running example, but similar things happened before with other packages. Notice that I am not blaming the GNUCash ports package. Were none of the dependencies installed already, it would have needed less handholding, I am sure. (Unfortunately, it seems the installed dependencies are never just right in a production system...) However, the overall experience would have been only slightly better.

Don't Waste My Time

Apparently, I am not the only one having trouble to install GNUCash from MacPorts: John had, too. When the efforts to build the package reached the point of becoming ridiculous, I started to keep a build log of the ordeal.
Much of this was trial-and-error, because I actually wanted to do something else. However, there are some gems in there, where the package system tries quite hard to mislead me.

It took me about one-and-a-half days to get GNUCash installed with MacPorts. Granted, there were long periods of waiting in between when the package system was busy doing whatever, and I was not near that computer most of the time. But it's even worse, the whole process is not entirely automatic—it required interaction every other hour or so to help it getting unstuck. I have better things to do than babysit a stretched-out installation process.

Okay, GNUCash has a lot of dependencies. Perhaps, it is the ultimate package system benchmark. Good! I applaud the developers for reusing whatever is possible. Actually, I don't even care, the package system should shield me from having to think about this at all. As a user, all I really care about is to have GNUCash installed.

And GNUCash is not the only behemoth here, this happened before. Take GHC, for example. It takes ages to build with MacPorts.

Download Missing Pieces In One Batch

The Debian equivalent of installing GNUCash looks like this:

% sudo time apt-get -d install gnucash
[...]
0 upgraded, 82 newly installed, 0 to remove and 171 not upgraded.Need to get 24.3MB/32.9MB of archives.
After unpacking 120MB of additional disk space will be used.
Do you want to continue [Y/n]?
[...]
Get: 65 http://ftp.de.debian.org testing/main gnucash 2.2.1-1 [1965kB]
Fetched 24.3MB in 4s (6024kB/s)
Download complete and in download only mode
1.16user 0.24system 0:30.63elapsed 4%CPU (0avgtext+0avgdata 0maxresident)k

So, I need a network connection for all of 4 seconds to get everything that is needed onto my hard disk for GNUCash to run. Before that, I am asked whether it is okay to continue, so that I can intervene if needed.

With a drawn-out build like GNUCash, MacPorts alternates between downloading a package, compiling it (which can take an arbitrary amount of time), installing it, and then looking at the next dependency. Suck! When I am near a fast net connection, I want the package system to download whatever is possibly needed to complete my initial installation request in one go. When finished, I want to disconnect my machine from the net, and go hiking in the outback while still being able to finish the installation. There is a reason why I chose a laptop as my main work horse: mobility.

In other words, network access should only be needed right after the user-initiated command to install a new package. When all network activity is done, a clear indication of this fact should be given. See also: Debian's package manager. (While not perfect, it gets some things right. I'll spare you mentioning this in the sections hereafter.)

Don't Install Stuff I Did Not Ask For (Or Explain Why)

No really, why would I ever want firefox-x11 on MacOS X, considering that I have already another Firefox installed, and Camino, and Flock, not to mention Safari, which I actually use most of the time? Oh right, yelp, a help browser for GNOME, depends on firefox-x11.
GNUCash Help

% port installed gnucash-docs
The following ports are currently installed:
  gnucash-docs @2.0.1_0 (active)

Why on earth does GNUCash pull in the firefox-x11 package? Not to mention evince and goffice! If other packages get pulled in because the package I requested has declared a dependency (perhaps by transitivity) on them, I want at least to be told what will be installed. Maybe I would refine my installation request in light of that information.

On a similar note, MacPorts tries really hard to install gcc, perl, and some other basic packages, completely ignoring the ones that ship with MacOS X. This is explained in the MacPorts FAQ:

The drawbacks on this behaviour also are minimal: Wasting 10MB for a Python installation is next to nothing if you have a GB-harddisk and gain consistency all the way in return.

Right. Except, how long does it take to build gcc, perl and python from source? Long enough to make me forget what I actually wanted to install, and why.

Anyway, some programs come bundled with heaps of documentation, or in different flavors. Sometimes, I just don't care, but sometimes I'd like to have some influence on what gets installed. Depending on the circumstances, I would like to choose from a minimal or a batteries included installation, or just the documentation, please. Too much to ask?

Building from Source Considered Harmful

The reason is simple: there are too many variables in the process. Too much can (and routinely does) go wrong, resulting in broken programs, or even installations. Another war story: after a gettext update (which was pulled in as a dependency by some other program, no less), all of a sudden vital programs started to segfault: awk, sed, tcl, you name it. Unnecessary to mention, MacPorts needs some of these programs to function. Chicken, meet Egg.

I hope nobody dares to point to the installable packages facility of MacPorts. They are utterly lacking in comparison to, e.g., the Debian package manager. Dependencies, anyone?

I am really astonished that with all the emphasis on optimization, it is not considered odd that every installation of MacPorts repeats all the package compilation jobs over again. Why not cache the work on download servers, and share it? Even if package build bots are generally untrusted, it should only be a Small Matter Of Programming to provide a web of trust with cryptographically signed packages, right?

Build Environment and Run-time Environment Should be Separate

Build Environment
The Build Environment of a package contains the set of dependencies needed to successfully build that package from source code. This includes external packages containing compilers, linkers, header files, libraries, documentation generators, OS settings.
Run-time Environment
The Run-time Environment of a package contains dependencies needed to use the package. Without them present, the contents of the package are not functional, at least not fully.

The point here is that the build environment can differ significantly from the run-time environment, and a package system should acknowledge this.

Extreme cases are cross-compiled packages: such packages might not even be usable in the build environment, but they are usable in the run-time environment. A more common case is that the build environment has perhaps a compiler or document generation system like TeX installed, while the run-time environment does not need a compiler, but instead a PDF reader to view the documentation (which was not needed on the build environment).

Another reason is that it is usually fine to have several versions of a single library installed at the same time, with the loader taking care of choosing the right one for the program at hand. However, when it comes to compiling and linking C code, one set of header files and one (matching) shared library should be installed in standard places in the build environment, such that the brittle mess that is autotools (a fine topic for another rant) can find them without getting confused. When I want to build against the newest library version, I do not want to touch at all the installed programs which were built against some older version. They should keep working just fine.

As user of the package, I do not care what was needed to get it to the point when I can execute the program. Whether it was written in C or INTERCAL does not matter, as long as I have an executable to run, which works.

Thus, all I need to ensure is that the run-time environment for the package is complete. And fortunately, I do not have to take care about that myself. If the package declares its dependencies, the package system can make sure that everything that is needed gets installed alongside the package.

Transactions and Rollback

If something goes wrong during package installation, I want an easy way out. I want at least the ability to get back to the point before I started installation. Also, if some dependencies cannot be fulfilled, I want the whole transaction to fail, or possibly ask the user whether to proceed with a partial installation.

User Interaction

Even with the ideal of a short installation time, any user interaction should be at the beginning (or the end) of this transaction, never interspersed with other activity.

You Failed It

How long are the BSDs around? Is the ports system really the best package system that came out of this? When I was waiting for the GNUCash build to finish I saw warnings scrolling by which effectively said that something could not be undone because post-remove hooks are not yet implemented. It's 2007 for another couple of days, the much sneered-at Linux package systems (e.g., Debian's) have had {pre,post}-{install,remove} hooks for how many years now?

So, invoking Fred Brooks here ("Fail fast"): please, let MacPorts die. It is a waste of resources. Instead, I recommend spending energy on improving, e.g., Fink packages. At least, it has modern, full-featured package management tools. Speaking of which, I just asked fink to selfupdate via rsync. It is still... COMPILING?!

p.s.: Another option is Zero-Install, the package management nirvana. I managed to get Zero-Install installed on MacOS X actually (which also was not a smooth process, in parts thanks to MacPorts). However, before it becomes usable in day-to-day situations it needs a bigger user base. I will give it a spin and see how far I get.

UPDATE 2008-02-13: Et tu, Fink.

Yeah, that totally worked. Not!

Fink sent me through a twisty maze of virtual packages, all alike (i.e., not working). Eventually, I caved in and let it install ghostscript and tetex, too, before it agreed to continue with GNUCash. The end result is not very impressive.

After the fink selfupdate command and fink -b install gnucash, this is what I ended up with:

% du -sh /sw     
1.2G    /sw
1.2 GB worth of (mostly build-time) dependencies.

On the good side, all the downloading appeared to happen at the beginning of the command. Had fink not bailed out half dozen times, the installation process could be called almost automatic. On the bad side, there was again quite some compilation involved. Any number of factors could or could not be the reason that gnucash segfaults.

When will the madness stop?