Honeycomb puzzle part 2

    In part 1, I looked at the honeycomb puzzle and how to solve it. In part 2, I look at ways of making it fast.

    Doing less work

    The first thing to do is to profile the program to see where it's spending its time. For Python, I used cProfile and for Haskell I used GHC's profiling options.

    If we profile the solver, I find 1.2 billion calls to present, which comes from trying each of the 21,661 distinct scored word sets against the 55,902 possible honeycombs. However, there are only 400 million calls to isSubset or isSubsetOf. (To see this in Haskell, you have to manually insert a cost centre within present, otherwise the profiler doesn't look inside function calls.)

    This difference in the number of calls means that two-thirds of the calls to present are weeded out because the scored word set doesn't contain the honeycomb's mandatory letter.

    Can I ensure those 800 million wasted calls to present are never made?

    One way to do it is the standard approach of trading space for time. In this case, I tolerate using more space (memory) to save time.

    I need to ensure that I only consider the scored word sets that contain the honeycomb's mandatory letter. I can do that by having 26 different collections of scored word sets, each guaranteed to have a particular letter. Any particular scored word set will occur in several of the collections, but scoring each honeycomb will take much less time.

    Creating these "partitioned" scored word sets requires building a dict or Map to hold them, keyed by mandatory letter.

    partitioned_scored_sets = {l: {s: scored_sets[s] 
                                   for s in scored_sets if l in s} 
                               for l in string.ascii_lowercase}
    
    mkPartitionedScoredSets scoredSets = M.fromList 
            [(c, scoreSetWithLetter c) | c <- ['a'..'z']]
        where scoreSetWithLetter c = 
            M.filterWithKey (\k _ -> c `S.member` k) scoredSets
    

    It also means a change to how I score a honeycomb, by inlining present and removing the check of the mandatory letter.

    def partitioned_score_honeycomb(honeycomb):
        return sum(sc for letters, sc in 
                   partitioned_scored_sets[honeycomb.mandatory].items() 
                   if letters.issubset(honeycomb.letters)
                  )
    
    scoreHoneycomb scoredSets (Honeycomb mandatory letters) = 
        M.foldrWithKey scoreLetters 0 (scoredSets ! mandatory)
        where scoreLetters ls score total
                | ls `S.isSubsetOf` letters = score + total
                | otherwise = total
    

    Does it work?

    This change produces the same result, but quicker.

     
    Python
    Haskell
    Original 5 min 38s 4 min 6s
    Partitioned 1 min 25s 1 min 10s

    A substantial speedup.

    Different data structure

    Up to now, I've been using sets to store word candidates. Sets are capable of holding arbitrarily large collections of things, but I don't need that here. I'm only storing letters, and there are only 26 of them. Can I exploit that to make a faster lookup?

    Letters are either present in a set or not. That means I can represent a set of letters as a 26-bit binary string. Conveniently, that fits in a 32-bit unsigned integer. Bitwise operations allow me to do set operations quickly too. Using this representation, the intersection of two sets is performed with bitwise-and. The subset predicate is defined as when the intersection is the same as the smaller set: $ A \subset B \equiv (A \cap B) = A $.

    Implementing the bitpattern data structure

    To use this data structure, I need to specify exactly how to store sets of letters, define some functions that will convert between a set of letters into an integer, and implement the subset predicate.

    I'll use the 26 least significant bits in order, with the bit for 225 representing "a", the bit for 224 representing "b", down to 21 representing "y" and 20 representing "z". I store the bitpattern as a straight int for Python and a Word32 from Data.Word for Haskell (and use Data.Bits for bit-twiddling functions).

    First I need to encode a set of letters as a bitpattern. In Python, I iterate over all letters, setting a bit if the letter is in the set. In Haskell, I use a fold instead of a loop, as I'm constructing a single value from a list.

    def encode(letters):
        encoded = 0
        for l in string.ascii_lowercase:
            encoded <<= 1
            if l in letters:
                encoded |= 1
        return encoded
    
    type LetterSet = Word32
    
    encode :: String -> LetterSet
    encode letters = foldl' encodeLetter zeroBits ['a' .. 'z']
        where encodeLetter w l 
                | l `elem` letters = setBit (shift w 1) 0
                | otherwise = shift w 1
    

    Decoding a bitpattern to a set of letters is another loop in Python, encoding each letter and testing if it's present in the bitpattern. In Haskell, I filter the letters, using testBit and a bit of arithmetic to see if a particular letter is present in a bitpattern.

    def decode(letter_bits):
        letters = set()
        for l in string.ascii_lowercase:
            e = encode(l)
            if (e & letter_bits):
                letters.add(l)
        return letters
    
    decode :: LetterSet -> S.Set Char
    decode letterSet = S.fromList $ filter present ['a' .. 'z']
        where present c = testBit letterSet $ negate (fromEnum c - fromEnum 'z')
    

    Finally, I define the subset function, as a convenience. Note that I'm using the subset symbol in the Haskell definition.

    def subset(this, that):
        return this & that == this
    
    (⊂) :: LetterSet -> LetterSet -> Bool
    (⊂) this that = (this .&. that == this)
    

    Putting it to use

    Given the bitpattern representation of letter sets, let's incorporate it into the program.

    The Honeycomb dataclass in Python gets a bit more complex as I need to encode letter sets when creating the honeycomb.  

    @dataclass(frozen=True, init=False)
    class Honeycomb():
        mandatory: int
        letters: int
        
        def __init__(self, mandatory, letters):
            object.__setattr__(self, 'mandatory', encode(mandatory))
            object.__setattr__(self, 'letters', encode(letters))
                
        def __repr__(self):
            return f'{"".join(decode(self.mandatory))}|{"".join(sorted(decode(self.letters) - set(decode(self.mandatory))))}'
    
    data Honeycomb = Honeycomb LetterSet LetterSet
        deriving (Show, Eq, Ord)
    

    present uses the new subset function.

    def present(target, honeycomb):
        return (subset(honeycomb.mandatory, target)
               and subset(target, honeycomb.letters))
    
    present :: LetterSet -> Honeycomb -> Bool
    present target (Honeycomb mandatory letters) =
        mandatory ⊂ target && target ⊂ letters
    

    Scoring letter sets requires knowing how many elements there are in the letterset. Rather than recalculating it in Python, I keep the set representation until creating the scored_sets.

    scored_sets = {s: score(s) for s in word_sets}
    encoded_scored_sets = {encode(s): scored_sets[s] for s in scored_sets}
    

    Haskell's Data.Bits exports popCount, which returns the number of set bits in a Word. That's the number of elements, so I can convert words to bitpatterns as soon as I import them.

    mkWordSets :: [String] -> WordSet
    mkWordSets = foldr addWord M.empty
        where addWord w = M.insertWith S.union (encode w) (S.singleton w)
        
    scoreLetterSet :: LetterSet -> S.Set String -> Int
    scoreLetterSet letterSet scoringWords = bonus + (S.foldr' (\w t -> t + scoreAWord w) 0 scoringWords)
        where scoreAWord w 
                | length w == 4 = 1
                | otherwise     = length w
              bonus = if (popCount letterSet) == 7 then (S.size scoringWords) * 7 else 0 
    

    The rest of the programs remain the same.

    Performance

    How well does it work? The results are mixed.

     
    Python
    Haskell
    Original 5 min 38s 4 min 6s
    Bitpattern 6 min 18s 20s

    The Haskell implementation is much faster using this data structure. The Python version is actually slower. A lot of that slowdown comes from calling the separate subset function. If I inline the definition the few times it's used, the program then takes 4 min 1s.

    I think the difference is that the Haskell is using a data structure that's efficiently implemented on the machine, while the Python version still uses Python's internal infinite-sized int. I considered using the ctypes library to store values, but those aren't hashable so would require more changes to the code.

    Putting the two together

    I have two ways of improving performance. What if I put the two together, and use the partitioned set of scores and bitpattern representation?

     
    Python
    Haskell
    Original 5 min 25s 4 min 6s
    Partitioned + Bitpattern 1 min 4s 6s

    Again, inlining the bitpattern subset function improved Python's speed to about 37 seconds.

    Conclusion

    This mostly reinforces what we already know about the different languages. Python is good for quickly developing code where performance isn't vital. Haskell can be much faster, but not always.

    The other lesson from this is that the overall structure of the solution is similar in both languages. The broad shape of the data structures is the same in both languages, and the algorithms used are the same. Functional and imperative programming aren't that far apart!

    Code

    You can get the code from my locally-hosed Git repo, or from Gitlab.

    Neil Smith

    Read more posts by this author.