Michael Weber: Random Bits and Pieces

via Planet Smalltalk:

Andres implemented (product of) powerset functionality in Smalltalk:

  
allProductsTimes: aNumber
do: aBlock

       self
             allProductsTimes: aNumber
	     startingAt: 1
	     do: aBlock

allProductsTimes: aNumber
startingAt: anIndex
do: aBlock

       anIndex > self size
       ifTrue: [aBlock value: aNumber]
       ifFalse:
               [self
	             allProductsTimes: aNumber
		     startingAt: anIndex + 1
		     do: aBlock.

	       self
	             allProductsTimes: aNumber * (self at: anIndex)
		     startingAt: anIndex + 1
		     do: aBlock]

Isn't that absolutely beautiful?

I would not know how to judge beauty of the above code, but on technical grounds I would have to remark that powerset and arithmetic set product functionality is intermingled.

On closer inspection, the code consists of a fold (or reduce) operation over the result of a powerset operation. Some better factored Haskell code might look like this:


powerset :: [a] -> [[a]]
-- returns powerset of given set
powerset []     = [[]]
powerset (x:xs) = concat [ [s, x:s] | s <- powerset xs ]

With lists as set representation, a powerset is constructed recursively just like in the Smalltalk code: the powerset of the empty set [] is the set containing the emptyset [[]], otherwise the powerset is built by picking an element x, then for each set s in the powerset of the remaining elements xs, include it once with and once without picked element x.

powerset works for any type of sets. Trying it out for numbers:

Main> powerset [2 .. 4]
[[],[2],[3],[2,3],[4],[2,4],[3,4],[2,3,4]]

Now, that was only half the work, we still need to calculate the product. I mentioned already that it can be done with a folding operation (we will choose a left fold foldl):

Main> :t foldl
foldl :: (a -> b -> a) -> a -> [b] -> a

We fold the set with multiplication and initial argument 1 (the same definition as builtin function product):

Main> :t foldl (*) 1
foldl (*) 1 :: Num a => [a] -> a

Main> :t product
product :: Num a => [a] -> a

Note that so far, I have written three lines of code, and otherwise presented you a bunch of type signatures. So, putting it all together:


powerset :: [a] -> [[a]]
-- returns powerset of given set
powerset []     = [[]]
powerset (x:xs) = concat [ [s, x:s] | s <- powerset xs ]

allProducts :: Num a => [a] -> [a]
-- returns arithmetic product of each subset of given set
allProducts = map product . powerset
Main> allProducts [2 .. 4]
[1,2,3,6,4,8,12,24]

Now, isn't that absolutely beautiful close to the math underneath it?

For what it's worth, I believe the Smalltalk code could be factored similarly.

Notes:

  • If an initial value is not known, we could (like in the Smalltalk code) choose to take the first element of the set as initial value (requiring the set to contain at least one element):
    Main> :t foldl1 (*)
    foldl1 (*) :: Num a => [a] -> a
    
  • The above code works on sets represented as lists. Replacing [a] with Set a => a is left as an exercise to the reader (it will look uglier).

UPDATE 2008-01-08: Powerset Code Explanation

Andres looked at my Haskell powerset code and asked some questions which I try to address here.

I had a hard time understanding the meaning behind the names a, b, x, s, and what seemed to me as operators of some sort: ->, =>, [a].

I am afraid that's how simple functions are written in idiomatic Haskell. It's not as bad as it might look on first glance, I can try to explain some of it. A more detailed overview of Haskell syntax can be found elsewhere.


powerset :: [a] -> [[a]]

This is a type signature (denoted by ::), declaring function powerset to take list of a ([a]) and returning some list of lists of a (with a being an arbitrary type). That is, the function is polymorphic.

Type variables in Haskell are commonly named a, b, c,... because they denote any type, it does not matter which.


powerset (x:xs) = concat [ [s, x:s] | s <- powerset xs ]

Here x, xs, s are old-fashioned variables. x:xs is a pattern which the first parameter of powerset is matched against. It must be a list, with at least one element (which x will be bound to), and a rest list (which xs will be bound to).

Patterns like x:xs or y:ys can be found all over the place in Haskell, it's a naming convention akin to aBlock in Smalltalk.
More complicated functions usually have more expressive variable names, as you would expect.

The following is list comprehension syntax:


[ x | x <- gen ]

It roughly means: collect all x where each x is generated from gen. Again, Haskell syntax tries to be close to math where you would write something like { x | x in gen }.

Now to Andres' other questions:

  • Does the Haskell version need to be modified to accept the value of the empty product as an argument?

    Not sure what you mean by empty product exactly, but

    
    powerset [] == [[]]
    

    i.e. powerset applied to the empty set yields the set containing the empty set.

  • What would you need to write to iterate through the products with an arbitrary piece of code (the aBlocks in Smalltalk)?

    Haskell supports anonymous functions (lambda's) just like blocks in Smalltalk. The moral equivalent of Smalltalk(ish)'s

    
    1 to: 42 allProducts do: [ x | frob x ]
    

    would perhaps be

    
    mapM (\ x -> frob x) (allProducts [1 .. 42])
    
  • Would it work if the collection had stuff other than integers (such as modular integers, polynomials, functions or matrices)?
    allProducts' type (Num a => [a] -> [a]) suggests that it works for any type a which is in typeclass Num (class not quite in OO sense, though), i.e. the complete numeric tower. If you define multiplication for matrices or polynomials, the code would work unchanged for them, too.
  • With a collection of size 20, it would create (I am no Haskell expert!) 2^20 collections. Is that so?

    Let's try it out:

    Main> take 10 (powerset [1 .. 1000])
    [[],[1],[2],[2,1],[3],[3,1],[3,2],[3,2,1],[4],[4,1]]
    Main>
    

    The output is instantaneous. Clearly, Haskell does not create the full powerset (of size 21000). Instead, it relies on lazy evalutation to only evaluate as much as is needed to produce the result. In the above case I requested only the first 10 elements of the powerset, so that's what is generated.

    Actually, we can do even better (trading some of the presentational points). With a modified, even lazier version of powerset, we could generate the powerset of e.g. the natural numbers, i.e. an infinite set:

    
    powerset :: [a] -> [[a]]
    powerset xs = []:(powerset' xs [[]])
      where 
        powerset' :: [a] -> [[a]] -> [[a]]
        powerset' []     _  = []
        powerset' (x:xs) ps = let ps' = [ x:s | s <- ps ]
    			  in  ps' ++ powerset' xs (ps ++ ps')
    

    The idea behind this approach is to generate the powerset ps of all elements before x, then add element x to all subsets s of ps, emit those (ps') and continue for the remaining elements xs in the same way.

    Main> take 10 (powerset [1 ..])
    [[],[1],[2],[2,1],[3],[3,1],[3,2],[3,2,1],[4],[4,1]]
    

    I believe the same can be done in Smalltalk with the standard OO trick to emulate infinite lists: a generator/iterator object (which can be queried for the next element). Actually, I would love to see such a solution to compare to.

  • Regarding the lines of code, I would suggest that overall we're pretty close.

    Yes, I did not mean to imply that the Haskell version is shorter in terms of LOC (which is a poor metric anyway), apologies if it came across like this. My main point is how close Haskell code is to math: if we exchange concat and (++) for set union and curlies for brackets we are almost there.

    A minor point (and this is really with my code police hat on) which I made: the original Smalltalk code could have been factored some more, and the question I have is now would that increase code readability/reuse/beauty even more?.

    I think the following well-known quote from the Wizard Book says it all:

    Programs should be written for people to read, and only incidentally for machines to execute.

UPDATE 2005-09-28: MORE POWERSET!

Interestingly, but not entirely surprisingly, there has already been a Haskell Cafe discussion about different versions of powerset.

I encourage the reader to try these versions out with

Main> take 10 (powerset [1 ..])