In part 1, I looked at the honeycomb puzzle and how to solve it. In part 2, I look at ways of making it fast.
Doing less work
The first thing to do is to profile the program to see where it's spending its time. For Python, I used cProfile
and for Haskell I used GHC's profiling options.
If we profile the solver, I find 1.2 billion calls to present
, which comes from trying each of the 21,661 distinct scored word sets against the 55,902 possible honeycombs. However, there are only 400 million calls to isSubset
or isSubsetOf
. (To see this in Haskell, you have to manually insert a cost centre within present
, otherwise the profiler doesn't look inside function calls.)
This difference in the number of calls means that two-thirds of the calls to present
are weeded out because the scored word set doesn't contain the honeycomb's mandatory letter.
Can I ensure those 800 million wasted calls to present
are never made?
One way to do it is the standard approach of trading space for time. In this case, I tolerate using more space (memory) to save time.
I need to ensure that I only consider the scored word sets that contain the honeycomb's mandatory letter. I can do that by having 26 different collections of scored word sets, each guaranteed to have a particular letter. Any particular scored word set will occur in several of the collections, but scoring each honeycomb will take much less time.
Creating these "partitioned" scored word sets requires building a dict
or Map
to hold them, keyed by mandatory letter.
partitioned_scored_sets = {l: {s: scored_sets[s]
for s in scored_sets if l in s}
for l in string.ascii_lowercase}
mkPartitionedScoredSets scoredSets = M.fromList
[(c, scoreSetWithLetter c) | c <- ['a'..'z']]
where scoreSetWithLetter c =
M.filterWithKey (\k _ -> c `S.member` k) scoredSets
It also means a change to how I score a honeycomb, by inlining present
and removing the check of the mandatory letter.
def partitioned_score_honeycomb(honeycomb):
return sum(sc for letters, sc in
partitioned_scored_sets[honeycomb.mandatory].items()
if letters.issubset(honeycomb.letters)
)
scoreHoneycomb scoredSets (Honeycomb mandatory letters) =
M.foldrWithKey scoreLetters 0 (scoredSets ! mandatory)
where scoreLetters ls score total
| ls `S.isSubsetOf` letters = score + total
| otherwise = total
Does it work?
This change produces the same result, but quicker.
Python |
Haskell |
|
---|---|---|
Original | 5 min 38s | 4 min 6s |
Partitioned | 1 min 25s | 1 min 10s |
A substantial speedup.
Different data structure
Up to now, I've been using sets to store word candidates. Sets are capable of holding arbitrarily large collections of things, but I don't need that here. I'm only storing letters, and there are only 26 of them. Can I exploit that to make a faster lookup?
Letters are either present in a set or not. That means I can represent a set of letters as a 26-bit binary string. Conveniently, that fits in a 32-bit unsigned integer. Bitwise operations allow me to do set operations quickly too. Using this representation, the intersection of two sets is performed with bitwise-and. The subset predicate is defined as when the intersection is the same as the smaller set: $ A \subset B \equiv (A \cap B) = A $.
Implementing the bitpattern data structure
To use this data structure, I need to specify exactly how to store sets of letters, define some functions that will convert between a set of letters into an integer, and implement the subset predicate.
I'll use the 26 least significant bits in order, with the bit for 225 representing "a", the bit for 224 representing "b", down to 21 representing "y" and 20 representing "z". I store the bitpattern as a straight int
for Python and a Word32
from Data.Word
for Haskell (and use Data.Bits
for bit-twiddling functions).
First I need to encode a set of letters as a bitpattern. In Python, I iterate over all letters, setting a bit if the letter is in the set. In Haskell, I use a fold
instead of a loop, as I'm constructing a single value from a list.
def encode(letters):
encoded = 0
for l in string.ascii_lowercase:
encoded <<= 1
if l in letters:
encoded |= 1
return encoded
type LetterSet = Word32
encode :: String -> LetterSet
encode letters = foldl' encodeLetter zeroBits ['a' .. 'z']
where encodeLetter w l
| l `elem` letters = setBit (shift w 1) 0
| otherwise = shift w 1
Decoding a bitpattern to a set of letters is another loop in Python, encoding each letter and testing if it's present in the bitpattern. In Haskell, I filter
the letters, using testBit
and a bit of arithmetic to see if a particular letter is present in a bitpattern.
def decode(letter_bits):
letters = set()
for l in string.ascii_lowercase:
e = encode(l)
if (e & letter_bits):
letters.add(l)
return letters
decode :: LetterSet -> S.Set Char
decode letterSet = S.fromList $ filter present ['a' .. 'z']
where present c = testBit letterSet $ negate (fromEnum c - fromEnum 'z')
Finally, I define the subset
function, as a convenience. Note that I'm using the subset symbol in the Haskell definition.
def subset(this, that):
return this & that == this
(⊂) :: LetterSet -> LetterSet -> Bool
(⊂) this that = (this .&. that == this)
Putting it to use
Given the bitpattern representation of letter sets, let's incorporate it into the program.
The Honeycomb
dataclass in Python gets a bit more complex as I need to encode letter sets when creating the honeycomb.
@dataclass(frozen=True, init=False)
class Honeycomb():
mandatory: int
letters: int
def __init__(self, mandatory, letters):
object.__setattr__(self, 'mandatory', encode(mandatory))
object.__setattr__(self, 'letters', encode(letters))
def __repr__(self):
return f'{"".join(decode(self.mandatory))}|{"".join(sorted(decode(self.letters) - set(decode(self.mandatory))))}'
data Honeycomb = Honeycomb LetterSet LetterSet
deriving (Show, Eq, Ord)
present
uses the new subset
function.
def present(target, honeycomb):
return (subset(honeycomb.mandatory, target)
and subset(target, honeycomb.letters))
present :: LetterSet -> Honeycomb -> Bool
present target (Honeycomb mandatory letters) =
mandatory ⊂ target && target ⊂ letters
Scoring letter sets requires knowing how many elements there are in the letterset. Rather than recalculating it in Python, I keep the set
representation until creating the scored_sets
.
scored_sets = {s: score(s) for s in word_sets}
encoded_scored_sets = {encode(s): scored_sets[s] for s in scored_sets}
Haskell's Data.Bits
exports popCount
, which returns the number of set bits in a Word
. That's the number of elements, so I can convert words to bitpatterns as soon as I import them.
mkWordSets :: [String] -> WordSet
mkWordSets = foldr addWord M.empty
where addWord w = M.insertWith S.union (encode w) (S.singleton w)
scoreLetterSet :: LetterSet -> S.Set String -> Int
scoreLetterSet letterSet scoringWords = bonus + (S.foldr' (\w t -> t + scoreAWord w) 0 scoringWords)
where scoreAWord w
| length w == 4 = 1
| otherwise = length w
bonus = if (popCount letterSet) == 7 then (S.size scoringWords) * 7 else 0
The rest of the programs remain the same.
Performance
How well does it work? The results are mixed.
Python |
Haskell |
|
---|---|---|
Original | 5 min 38s | 4 min 6s |
Bitpattern | 6 min 18s | 20s |
The Haskell implementation is much faster using this data structure. The Python version is actually slower. A lot of that slowdown comes from calling the separate subset
function. If I inline the definition the few times it's used, the program then takes 4 min 1s.
I think the difference is that the Haskell is using a data structure that's efficiently implemented on the machine, while the Python version still uses Python's internal infinite-sized int
. I considered using the ctypes
library to store values, but those aren't hashable so would require more changes to the code.
Putting the two together
I have two ways of improving performance. What if I put the two together, and use the partitioned set of scores and bitpattern representation?
Python |
Haskell |
|
---|---|---|
Original | 5 min 25s | 4 min 6s |
Partitioned + Bitpattern | 1 min 4s | 6s |
Again, inlining the bitpattern subset
function improved Python's speed to about 37 seconds.
Conclusion
This mostly reinforces what we already know about the different languages. Python is good for quickly developing code where performance isn't vital. Haskell can be much faster, but not always.
The other lesson from this is that the overall structure of the solution is similar in both languages. The broad shape of the data structures is the same in both languages, and the algorithms used are the same. Functional and imperative programming aren't that far apart!
Code
You can get the code from my locally-hosed Git repo, or from Gitlab.