Honeycomb puzzle part 2

In part 1, I looked at the honeycomb puzzle and how to solve it. In part 2, I look at ways of making it fast.

Doing less work

The first thing to do is to profile the program to see where it's spending its time. For Python, I used cProfile and for Haskell I used GHC's profiling options.

If we profile the solver, I find 1.2 billion calls to present, which comes from trying each of the 21,661 distinct scored word sets against the 55,902 possible honeycombs. However, there are only 400 million calls to isSubset or isSubsetOf. (To see this in Haskell, you have to manually insert a cost centre within present, otherwise the profiler doesn't look inside function calls.)

This difference in the number of calls means that two-thirds of the calls to present are weeded out because the scored word set doesn't contain the honeycomb's mandatory letter.

Can I ensure those 800 million wasted calls to present are never made?

One way to do it is the standard approach of trading space for time. In this case, I tolerate using more space (memory) to save time.

I need to ensure that I only consider the scored word sets that contain the honeycomb's mandatory letter. I can do that by having 26 different collections of scored word sets, each guaranteed to have a particular letter. Any particular scored word set will occur in several of the collections, but scoring each honeycomb will take much less time.

Creating these "partitioned" scored word sets requires building a dict or Map to hold them, keyed by mandatory letter.

partitioned_scored_sets = {l: {s: scored_sets[s] 
                               for s in scored_sets if l in s} 
                           for l in string.ascii_lowercase}
mkPartitionedScoredSets scoredSets = M.fromList 
        [(c, scoreSetWithLetter c) | c <- ['a'..'z']]
    where scoreSetWithLetter c = 
        M.filterWithKey (\k _ -> c `S.member` k) scoredSets

It also means a change to how I score a honeycomb, by inlining present and removing the check of the mandatory letter.

def partitioned_score_honeycomb(honeycomb):
    return sum(sc for letters, sc in 
               partitioned_scored_sets[honeycomb.mandatory].items() 
               if letters.issubset(honeycomb.letters)
              )
scoreHoneycomb scoredSets (Honeycomb mandatory letters) = 
    M.foldrWithKey scoreLetters 0 (scoredSets ! mandatory)
    where scoreLetters ls score total
            | ls `S.isSubsetOf` letters = score + total
            | otherwise = total

Does it work?

This change produces the same result, but quicker.

 
Python
Haskell
Original 5 min 38s 4 min 6s
Partitioned 1 min 25s 1 min 10s

A substantial speedup.

Different data structure

Up to now, I've been using sets to store word candidates. Sets are capable of holding arbitrarily large collections of things, but I don't need that here. I'm only storing letters, and there are only 26 of them. Can I exploit that to make a faster lookup?

Letters are either present in a set or not. That means I can represent a set of letters as a 26-bit binary string. Conveniently, that fits in a 32-bit unsigned integer. Bitwise operations allow me to do set operations quickly too. Using this representation, the intersection of two sets is performed with bitwise-and. The subset predicate is defined as when the intersection is the same as the smaller set: $ A \subset B \equiv (A \cap B) = A $.

Implementing the bitpattern data structure

To use this data structure, I need to specify exactly how to store sets of letters, define some functions that will convert between a set of letters into an integer, and implement the subset predicate.

I'll use the 26 least significant bits in order, with the bit for 225 representing "a", the bit for 224 representing "b", down to 21 representing "y" and 20 representing "z". I store the bitpattern as a straight int for Python and a Word32 from Data.Word for Haskell (and use Data.Bits for bit-twiddling functions).

First I need to encode a set of letters as a bitpattern. In Python, I iterate over all letters, setting a bit if the letter is in the set. In Haskell, I use a fold instead of a loop, as I'm constructing a single value from a list.

def encode(letters):
    encoded = 0
    for l in string.ascii_lowercase:
        encoded <<= 1
        if l in letters:
            encoded |= 1
    return encoded
type LetterSet = Word32

encode :: String -> LetterSet
encode letters = foldl' encodeLetter zeroBits ['a' .. 'z']
    where encodeLetter w l 
            | l `elem` letters = setBit (shift w 1) 0
            | otherwise = shift w 1

Decoding a bitpattern to a set of letters is another loop in Python, encoding each letter and testing if it's present in the bitpattern. In Haskell, I filter the letters, using testBit and a bit of arithmetic to see if a particular letter is present in a bitpattern.

def decode(letter_bits):
    letters = set()
    for l in string.ascii_lowercase:
        e = encode(l)
        if (e & letter_bits):
            letters.add(l)
    return letters
decode :: LetterSet -> S.Set Char
decode letterSet = S.fromList $ filter present ['a' .. 'z']
    where present c = testBit letterSet $ negate (fromEnum c - fromEnum 'z')

Finally, I define the subset function, as a convenience. Note that I'm using the subset symbol in the Haskell definition.

def subset(this, that):
    return this & that == this
(⊂) :: LetterSet -> LetterSet -> Bool
(⊂) this that = (this .&. that == this)

Putting it to use

Given the bitpattern representation of letter sets, let's incorporate it into the program.

The Honeycomb dataclass in Python gets a bit more complex as I need to encode letter sets when creating the honeycomb.  

@dataclass(frozen=True, init=False)
class Honeycomb():
    mandatory: int
    letters: int
    
    def __init__(self, mandatory, letters):
        object.__setattr__(self, 'mandatory', encode(mandatory))
        object.__setattr__(self, 'letters', encode(letters))
            
    def __repr__(self):
        return f'{"".join(decode(self.mandatory))}|{"".join(sorted(decode(self.letters) - set(decode(self.mandatory))))}'
data Honeycomb = Honeycomb LetterSet LetterSet
    deriving (Show, Eq, Ord)

present uses the new subset function.

def present(target, honeycomb):
    return (subset(honeycomb.mandatory, target)
           and subset(target, honeycomb.letters))
present :: LetterSet -> Honeycomb -> Bool
present target (Honeycomb mandatory letters) =
    mandatory ⊂ target && target ⊂ letters

Scoring letter sets requires knowing how many elements there are in the letterset. Rather than recalculating it in Python, I keep the set representation until creating the scored_sets.

scored_sets = {s: score(s) for s in word_sets}
encoded_scored_sets = {encode(s): scored_sets[s] for s in scored_sets}

Haskell's Data.Bits exports popCount, which returns the number of set bits in a Word. That's the number of elements, so I can convert words to bitpatterns as soon as I import them.

mkWordSets :: [String] -> WordSet
mkWordSets = foldr addWord M.empty
    where addWord w = M.insertWith S.union (encode w) (S.singleton w)
    
scoreLetterSet :: LetterSet -> S.Set String -> Int
scoreLetterSet letterSet scoringWords = bonus + (S.foldr' (\w t -> t + scoreAWord w) 0 scoringWords)
    where scoreAWord w 
            | length w == 4 = 1
            | otherwise     = length w
          bonus = if (popCount letterSet) == 7 then (S.size scoringWords) * 7 else 0 

The rest of the programs remain the same.

Performance

How well does it work? The results are mixed.

 
Python
Haskell
Original 5 min 38s 4 min 6s
Bitpattern 6 min 18s 20s

The Haskell implementation is much faster using this data structure. The Python version is actually slower. A lot of that slowdown comes from calling the separate subset function. If I inline the definition the few times it's used, the program then takes 4 min 1s.

I think the difference is that the Haskell is using a data structure that's efficiently implemented on the machine, while the Python version still uses Python's internal infinite-sized int. I considered using the ctypes library to store values, but those aren't hashable so would require more changes to the code.

Putting the two together

I have two ways of improving performance. What if I put the two together, and use the partitioned set of scores and bitpattern representation?

 
Python
Haskell
Original 5 min 25s 4 min 6s
Partitioned + Bitpattern 1 min 4s 6s

Again, inlining the bitpattern subset function improved Python's speed to about 37 seconds.

Conclusion

This mostly reinforces what we already know about the different languages. Python is good for quickly developing code where performance isn't vital. Haskell can be much faster, but not always.

The other lesson from this is that the overall structure of the solution is similar in both languages. The broad shape of the data structures is the same in both languages, and the algorithms used are the same. Functional and imperative programming aren't that far apart!

Code

You can get the code from my locally-hosed Git repo, or from Gitlab.