A little late to the day, but I eventually got around to addressing this "honeycomb puzzle" challenge. It's a simple puzzle, but there's scope for a little bit of optimisation.
In this part, I solve the puzzle. In part 2, I look at a couple of optimisations for it.
The puzzle was set by Five Thirty-Eight, based on a game in the New York Times. I'll just quote the rules from Five Thirty-Eight:
The goal is to identify as many words that meet the following criteria:
- The word must be at least four letters long.
- The word must include the central letter.
- The word cannot include any letter beyond the seven given letters.
Note that letters can be repeated. For example, the words GAME and AMALGAM are both acceptable words. Four-letter words are worth 1 point each, while five-letter words are worth 5 points, six-letter words are worth 6 points, seven-letter words are worth 7 points, etc. Words that use all of the seven letters in the honeycomb are known as “pangrams” and earn 7 bonus points (in addition to the points for the length of the word). So in the above example, MEGAPLEX is worth 15 points.
The challenge is to find the honeycomb that gives the highest score (using a vocabulary from a cached version of the Enable word list), but with restrictions that the honeycomb not contain any duplicate letters (why would you?), cannot contain the letter "S", and must contain at least one pangram.
That's the challenge, let's go about solving it.
I'll do that in both Python and Haskell. The approach, algorithms, and data structures are the same in both versions, even if the surface level syntax is different.
Imports
In both Python and Haskell, I need to import a few libraries.
Python uses some standard libraries.
import string
import collections
from dataclasses import dataclass
Haskell uses some standard libraries, including the containers
library for sets and maps.
import qualified Data.Set as S
import qualified Data.Map.Strict as M
import Data.Map.Strict ((!))
import Data.List
Key data structures
This problem revolves around sets of letters and checking subsets of letters. These are sets in the mathematical sense: collections of letters that are unordered and without duplicates. The letters in the honeycomb can be used in any order. I can use each letter as many times as we like, so the only thing I care about is whether a letter exists in the honeycomb or not.
Being a bit more precise, I can make a word from a honeycomb if
- The letters in the word are a subset of the letters in the honeycomb
- The central letter is in the word
Honeycomb
From the definition of "is a word present in a honeycomb", it follows that a honeycomb is something that has a set of letters and a mandatory letter. Let's define that.
In Python, I'll use a dataclass
, which I'll make immutable (frozen):
@dataclass(frozen=True)
class Honeycomb():
mandatory: str
letters: frozenset
def __repr__(self):
return f'{self.mandatory}|{"".join(sorted(self.letters - set(self.mandatory)))}'
(I also define __repr__
to pretty-print honeycombs if need be.)
In Haskell, it's an algebraic data type (and a type alias for convenience):
type LetterSet = S.Set Char
data Honeycomb = Honeycomb Char LetterSet
deriving (Show, Eq, Ord)
Word-in-honeycomb test
Given a honeycomb, a word is present in that honeycomb given the two conditions above. They're directly translated into code.
In Python
def present(target, honeycomb):
return ( (honeycomb.mandatory in target)
and target.issubset(honeycomb.letters)
)
and Haskell
present :: LetterSet -> Honeycomb -> Bool
present target (Honeycomb mandatory letters) =
(mandatory `S.member` target) && (target `S.isSubsetOf` letters)
Vocabulary
The next thing is to read and store the vocabulary. I could just store a list of the words, but really I'm only interested in the sets of letters in a word. Instead, I'll group together all the words that have the same set of letters. I'll have a dict
or map
connecting sets of letters to the words that they can make.
For instance, the set {a, e, g, p} can be used to make the words {peage, agapae, page, gape, agape, peag}. The dict
will have an entry with key {a, e, g, p} and the value at that key will be {peage, agapae, page, gape, agape, peag}.
Loading the words
This is basic file reading (and a new type synonym for Haskell). There are a couple of constraints we can apply at this point, which rule out three classes of words:
- Words with fewer than four letters don't score
- Words with more than seven distinct letters can't be made by a honeycomb
- Words containing an 's', as the honeycomb we're aiming for doesn't contain an 's'
with open('enable1.txt') as f:
words = [line.strip() for line in f]
words = set(w for w in words
if len(w) >= 4
if len(frozenset(w)) <= 7
if 's' not in w)
type WordSet = M.Map LetterSet (S.Set String)
main = do
allWords <- readFile "enable1.txt"
let validWords = [w | w <- words allWords,
length w >= 4,
(S.size $ S.fromList w) <= 7,
's' `notElem` w]
This is enough to check which words are in any given honeycomb. But I want to build the word sets, to help me check many honeycombs.
Building the word sets
The basic idea is to walk across the collection of loaded words, inserting each one into the collection of word sets. The key is the set of letters, the value is the set of words that can be made from those letters.
In Python, this looks like iterating over the list of words, inserting each word as I go. I use a collections.defaultdict
as the data structure, as that makes it easier to insert each word (we assume there's always a set of words for each key, regardless of whether we've we've see the key before).
word_sets = collections.defaultdict(set)
for w in words:
letters = frozenset(w)
word_sets[letters].add(w)
In Haskell, I'm fold
ing the words into the map
, starting with an empty map. I use the the insertWith
function to add words to the map.
mkWordSets :: [String] -> WordSet
mkWordSets = foldr addWord M.empty
where addWord w = M.insertWith S.union (S.fromList w) (S.singleton w)
Note that there's a fair bit of partial application in this definition. addWord
has type String -> WordSet -> WordSet
and uses union
to combine this word (as a singleton set) with the existing words (if any).
I end up being able to feed a key of word_sets
into present
and that will tell me if those words can be made with that honeycomb.
Scoring word sets
The end goal is to find the highest-scoring honeycomb. That means I'll be presenting lots of word sets to lots of honeycombs and then scoring them. But each set of words has the same score each time it's present, so I can save time by calculating the score for a set of letters and store that score.
I'll follow the scoring rules as given. A four-letter word scores 1 point. Longer words score their length. If all seven letters are used, score a bonus 7 points. We wrap those rules in a function that, when given a set of letters, finds the words that can be made from that set, finds the score for each word, and returns the sum of all scores.
For Python, I use a comprehension to transform the word_sets
to scored_sets
, and note how the global variable word_sets
is accessed inside score
.
def score(present_set):
score = 0
for w in word_sets[present_set]:
if len(w) == 4:
score += 1
else:
score += len(w)
if len(present_set) == 7:
score += len(word_sets[present_set]) * 7
return score
scored_sets = {s: score(s) for s in word_sets}
For Haskell, first I define a new type for scored letter sets, and I convert a WordSet
to a ScoredSet
by using mapWithKey
to convert each set of letters to a score..
type ScoredSet = M.Map LetterSet Int
let scoredSets = M.mapWithKey scoreLetterSet wordSets
The function scoreLetterSet
is given a set of letters and the words can can be made from it. It finds the score for each word and adds them up (by using a fold
to combine the scores for each word).
scoreLetterSet :: LetterSet -> S.Set String -> Int
scoreLetterSet letterSet scoringWords = bonus + (S.foldr' (\w t -> t + scoreAWord w) 0 scoringWords)
where scoreAWord w
| length w == 4 = 1
| otherwise = length w
bonus = if (S.size letterSet) == 7 then (S.size scoringWords) * 7 else 0
Scoring honeycombs
Now I know the score that comes from any particular set of letters, I can score a honeycomb. The first thing to check is that a set of letters is present in a honeycomb: the honeycomb's mandatory letter is in the set, and all the letters in the set are in the honeycomb.
def present(target, honeycomb):
return ( (honeycomb.mandatory in target)
and target.issubset(honeycomb.letters)
)
present :: LetterSet -> Honeycomb -> Bool
present target (Honeycomb mandatory letters) =
(mandatory `S.member` target) && (target `S.isSubsetOf` letters)
The Python method to score a honeycomb uses the iterator
interface to walk over the scored_sets
dictionary, filter them by whether letters are present
in the honeycomb, then sum the scores.
def score_honeycomb(honeycomb):
return sum(sc for letters, sc in scored_sets.items()
if present(letters, honeycomb)
)
I could do something similar in Haskell, using a filter
to find just the letter sets that are present in the honeycomb, then summing their scores. But really what we're doing is a fold, combining the elements of the scored letter set map into a single value (the overall score); the wrinkle here is that the combining function uses the value of the key to decide if this letter set's score should be included in the total.
scoreHoneycomb :: ScoredSet -> Honeycomb -> Int
scoreHoneycomb scoredSets honeycomb = M.foldrWithKey scoreLetters 0 scoredSets
where scoreLetters letters score total
| present letters honeycomb = score + total
| otherwise = total
Finding the best honeycomb
I'm now in a position to solve the original problem, that of finding the honeycomb with the best score (from the dictionary I have). How can I find the honeycombs to test?
One approach is to find all seven-letter subsets of the alphabet and, for each of those, build the seven possible honeycombs. This gives
$$7 \times \frac{26!}{19! \times 7!} = 4,604,600$$
possible honeycombs. That's a doable number, but still large.
But the puzzle description says I'm after a honeycomb where every letter is used, which means I can base our honeycombs on those letter sets. There are 14,741 words with seven distinct letters, grouped into 7986 distinct groups. That means there are only 55,902 distinct honeycombs to investigate.
I can find those "pangram" letter sets, and hence the possible honeycombs, by filtering the letter sets we have. I generate the plausible honeycombs by generating honeycombs for each letter set.
pangram_sets = [s for s in word_sets.keys() if len(s) == 7]
ps_honeycombs = [Honeycomb(mandatory=mand, letters=ps)
for ps in pangram_sets
for mand in ps]
pangramSets = S.filter ((7 == ) . S.size) $ M.keysSet scoredSets
mkPlausibleHoneycombs :: S.Set LetterSet -> S.Set Honeycomb
mkPlausibleHoneycombs pangramSets = S.foldr S.union S.empty honeycombSets
where honeycombSets = S.map honeycombsOfLetters pangramSets
honeycombsOfLetters ls = S.map (\l -> Honeycomb l ls) ls
Now I know all the plausible honeycombs, and can find the score for each, I can find the honeycomb with the best score. In Python, it's easy using the built-in max
function:
best_honyecomb = max(ps_honeycombs, key=score_honeycomb)
print(best_honyecomb, partitioned_score_honeycomb(best_honyecomb))
In Haskell, I have to build our own version of max
that operates over the Set
of honeycombs, using foldr
to combine the honeycombs into just the one with the best score.
findBestHoneycomb scoredSets honeycombs =
S.foldr (betterHc scoredSets) (0, initHc) honeycombs
where initHc = Honeycomb 'a' $ S.singleton 'a'
betterHc scoredSets hc (bestScore, bestHc)
| thisScore > bestScore = (thisScore, hc)
| otherwise = (bestScore, bestHc)
where thisScore = scoreHoneycomb scoredSets hc
And the best honeycomb is...
Thankfully, both programs agree on the best honeycomb and its score!
The best honecomb is "r|aegint" with a score of 3898.
The bad news is that this takes a long time: about 5½ minutes for Python, and 4 minutes for Haskell. That's far too long. In the next part, I'll look at a couple of ways to optimise these programs.
Code
You can get the code from my locally-hosed Git repo, or from Gitlab.