This is the final optimisation post for my solutions to the Advent of Code 2022. My review of my solutions showed that day 19 was by far the slowest of my solutions, taking about 20 minutes to find a result. I must be able to do better.
Most of the speedup came from reframing the problem, but there's some acutal low-level Haskell optimisation at the end!
Factory, not time
My original solution was an A* search, stepping forward in time. At each minute of the simulation, I generated a state of what could happen in the next minute. In that case, the A* search approach guaranteed that the first solution would be the optimal one, but the heuristic I was using (assuming you can make a geode-cracking robot each minute after the current time) vastly overestimated the value of early states, leading to was wasn't far off a breadth-first search of the space.
No better heuristic immediately came to mind, so I instead decided to change how I represented the problem. My first attempt revolved around the time; I would recast the problem about the actions of the robot-making factory.
At any given time, there are options for what the factory can do. For each type of robot, we can decide to:
- If I have enough of that type of robot, do nothing.
- If I have enough resources on hand, make that robot now.
- If I don't have the resources, but will have them if I wait for a bit, then wait until I have the resources and build the robot.
(These choices are mutually exclusive for each robot type.)
Options 2 and 3 collapse to the same thing if I recast it as "wait until I've got enough resources, then build" on the understanding that the wait may be zero minutes.
There's a global fourth action, which is just to run out the clock and have the existing robots gather what they will. This option will handle the last fragment of time for each run.
When waiting to build a robot, I can calculate how much time I need to wait. That suggests I should store the current time in the SearchState
:
data SearchState = SearchState
{ _resources :: Collection
, _robots :: Collection
, _currentTime :: Int
} deriving (Eq, Show, Ord)
makeLenses ''SearchState
Next is a function that handles the above decision making, for a particular robot at a particular state. Note that this returns a Maybe SearchState
, as it could be there's nothing to do for this type of robot. maxRobots
is a collection of the maximum quantity needed of each type of robot; recipe
is the requirements for this type of robot.
handleRobot :: SearchState -> Collection -> Int -> Resource -> Collection -> Maybe SearchState
handleRobot state maxRobots timeLimit robot recipe
| sufficientRobots robot state maxRobots = Nothing
| otherwise = buildWhenReady robot state recipe timeLimit
I then wrote little functions to do each of the steps.
-- do I already have enough of this robot?
sufficientRobots :: Resource -> SearchState -> Collection -> Bool
sufficientRobots robot state maxRobots =
(robot `MS.member` maxRobots)
&&
((MS.occur robot (state ^. robots)) >= (MS.occur robot maxRobots))
The case of delaying for a while until building a robot is a little more convoluted. This case is viable if, for each resource I'm short of, I have at least one robot that can gather that resource. In that case, I calculate the delay needed to get enough of each resource, and the final delay is the largest of them. (If I have the resources, there is no shortfall, so the final delay is zero; hence maximum0
.) If waiting will never generate enough resources (because a particular type of robot doesn't exist yet), return Nothing
.
-- assuming I can't build this robot, how long do I have to wait for the current
-- robots to gather enough to build it?
buildDelay :: SearchState -> Collection -> Maybe Int
buildDelay state recipe
| all (\r -> MS.member r rbts) (MS.distinctElems shortfall) = Just $ maximum0 $ fmap snd $ MS.toOccurList delay
| otherwise = Nothing
where shortfall = recipe `MS.difference` (state ^. resources)
delay = MS.foldOccur calcOneDelay MS.empty shortfall
rbts = state ^. robots
calcOneDelay resource count acc =
MS.insertMany resource
(ceiling $ (fromIntegral count) / (fromIntegral $ MS.occur resource rbts))
acc
maximum0 xs = if (null xs) then 0 else maximum xs
Using that, updating the state for that delay is a little simpler, but still has to handle a delay of Nothing
and a delay that would be beyond the allowed time limit. That's easier with a Maybe
monad.
builldWhenReady :: Resource -> SearchState -> Collection -> Int -> Maybe SearchState
buildWhenReady robot state recipe timeLimit =
do waitDelay <- buildDelay state recipe
delay <- tooLate (state ^. currentTime) (waitDelay + 1) timeLimit
let gathered = MS.foldOccur (\res n acc -> MS.insertMany res (n * delay) acc)
MS.empty
(state ^. robots)
return (state & robots %~ MS.insert robot
& resources %~ (MS.union gathered)
& resources %~ ( `MS.difference` recipe )
& currentTime %~ (+ delay)
)
tooLate :: Int -> Int -> Int -> Maybe Int
tooLate current delay timeLimit
| (current + delay) <= timeLimit = Just delay
| otherwise = Nothing
The body of buildWhenReady
demonstrates how lenses can make this kind of update much easier.
The per-robot-type processing is called from successors
, using catMaybes
to retain just the valid successors. It also handles the "wait until the end" case.
successors :: SearchState -> BlueprintContext (Q.Seq SearchState)
successors state =
do blueprint <- asks getBlueprint
maxRobots <- asks getMaxRobots
timeLimit <- asks getTimeLimit
let robotSuccessors = Q.fromList $ catMaybes $ M.elems $ M.mapWithKey (handleRobot state maxRobots timeLimit) blueprint
let timeRemaining = timeLimit - (state ^. currentTime)
let gathered = MS.foldOccur (\res n acc -> MS.insertMany res (n * timeRemaining) acc)
MS.empty
(state ^. robots)
let delayUntilEnd = (state & currentTime .~ timeLimit
& resources %~ (MS.union gathered)
)
return ( robotSuccessors |> delayUntilEnd )
With that, finding the overall solution still uses the A* search from before.
The final change was to use parallel processing properly, and explore the different blueprints in parallel. This speeds up part 1, where I process all 30 blueprints, and has less effect on part 2, where I only process three.
part1, part2 :: [(Int, Blueprint)] -> Int
part1 blueprints = sum [n * (MS.occur Geode (r ^. resources)) | (n, r) <- results]
where results = parMap rdeepseq (scoreBlueprint 24) blueprints
part2 blueprints = product [MS.occur Geode (r ^. resources) | (_, r) <- results]
where results = parMap rdeepseq (scoreBlueprint 32) $ take 3 blueprints
Performance
How well did this work?
Program | Wall time | Memory used (kb) |
---|---|---|
Original | 15:09.53 | 1,562,024 |
Factory-based | 0:14.02 | 1,031,664 |
Enormously well. The time went from 15-20 minutes to about 14 seconds. But it can go quicker.
Actual Haskell optimisation
Profiling the program showed something interesting. About two-thirds of the runtime was spent in the compare
function defined for the SearchState
record. This will be part of the Data.Set
library, that uses comparison for efficient storage. It's used when checking if the current SearchState
is in the closed
set. As the SearchState
is a couple of Multiset
s, this takes time.
I can reduce this by having a hashed representation of the SearchState
that's good enough for checking equality, even if it's not so useful in the rest of the program.
I could manually redefine the Eq
and Ord
functions for SearchState
, but that would require every element of the closed
set be rehashed for every membership check; that's likely to get expensive. An alternative is to store only the hashes in the closed
set, and hash the new member when I want to check for membership.
What should the hash be? Integers are fast, so let's use them. I want to hash a Multiset
to an integer. That will allow me to hash a SearchState
into a 3-tuple of the hash of the resources
and robots
fields, with the currentTime
.
I can quickly hash a Multiset
by adding the amounts of each element, scaling each one so that different Multiset
s will give different values. A quick inspection of some traces shows that we rarely get to over 100 of any resource, so let's be safe and use 200 as a scaling constant. I add the Enum
and Bounded
classes to Resource
to simplify the guarantee of using all the resource types.
data Resource = Ore | Clay | Obsidian | Geode
deriving (Show, Eq, Ord, Generic, Enum, Bounded)
type Collection = MS.MultiSet Resource
hashCollection :: Collection -> Int
hashCollection c =
let k = 200
in sum $ zipWith (\r n -> (MS.occur r c) * n) [minBound .. maxBound] $ iterate (* k) 1
type StateHash = (Int, Int, Int)
hashSearchState :: SearchState -> StateHash
hashSearchState s = (hashCollection (s ^. resources), hashCollection (s ^. robots), s ^. currentTime)
type ExploredStates = S.Set StateHash
All that's left is to call hashSearchState
the couple of times I examine the closed
set, in aStar
and candidates
.
Performance
This step makes another large improvement.
Program | Wall time | Memory used (kb) |
---|---|---|
Original | 15:09.53 | 1,562,024 |
Factory-based | 0:14.02 | 1,031,664 |
Hashing | 0:06.77 | 1,073,516 |
Hashing, single-thread | 0:13.43 | 508,552 |
The version withed the hashed closed set is twice as quick as before, taking less than seven seconds to complete.
Out of curiosity, I removed the parallel processing of the blueprints. This only doubled the runtime, but halved the memory consumption. It seems there is overhead in running in parallel!
Code
You can get the code from my locally-hosted Git repo, or from Gitlab.