Another example of optimising Haskell, this time looking at the Advent of Code 2022 day 16 puzzle and my original solution to it. This post is more about manipulating the problem description and picking better methods, rather than raw optimisation of Haskell code.

## Compressing the cave

The first thing was to look at the structure of the problem. The task is to find a path though a graph (a set of connected rooms), where some nodes have a score if you visit them, and to maximise that score within some time limit. The full task description has 62 rooms, of which only 15 have such a score; the others are there simply to add a cost to moving from room to room.

That means the first optimisation is to "compress" the cave into a form that only has the rooms with scores, and to find the shortest-path cost between each pair of scoring rooms.

First, I needed to modify the `Cave`

data structure to hold this new representation. The main changes is that each `tunnel`

between rooms now comes with a `tunnelLength`

. The other thing is that the time elapsed exploring the cave can't be computing simply from the number of rooms visited; instead, it's convenient to store it in the `SearchState`

.

```
type RoomID = String
data Tunnel = Tunnel { _tunnelTo :: RoomID, _tunnelLength :: Int}
deriving (Eq, Show, Ord)
makeLenses ''Tunnel
data Room = Room
{ _flowRate :: Int
, _tunnels :: S.Set Tunnel
} deriving (Eq, Show, Ord)
makeLenses ''Room
data SingleSearchState = SingleSearchState
{ _currentRoom :: RoomID
, _currentTime :: Int
, _sOpenValves :: S.Set RoomID
} deriving (Eq, Show, Ord)
makeLenses ''SingleSearchState
data DoubleSearchState = DoubleSearchState
{ _personRoom :: RoomID
, _personTime :: Int
, _elephantRoom :: RoomID
, _elephantTime :: Int
, _dOpenValves :: S.Set RoomID
} deriving (Eq, Show, Ord)
makeLenses ''DoubleSearchState
```

Compressing the cave is a simple sequence of breath-first searches, starting from each of the interesting rooms. Each search maintains a `Set`

of `Tunnels`

found (called `found`

) (each representing the shortest path to that room), a separate `Set`

of `Tunnels`

that go to interesting rooms (called `routes`

), and a boundary of rooms to be explored. Each explored room adds a bunch of tunnels to the results and moves the boundary. If a room already exists in `found`

, it's not processed again.

```
compress :: Cave -> Cave
compress cave = M.mapWithKey (compressRoom cave) cave
compressRoom :: Cave -> RoomID -> Room -> Room
compressRoom cave here room = room & tunnels .~ t'
where t' = reachableFrom cave [Tunnel here 0] S.empty S.empty
reachableFrom :: Cave -> [Tunnel] -> S.Set RoomID -> S.Set Tunnel -> S.Set Tunnel
reachableFrom _ [] _ routes = routes
reachableFrom cave (tunnel@(Tunnel here len):boundary) found routes
| here `S.member` found = reachableFrom cave boundary found routes
| otherwise = reachableFrom cave (boundary ++ (S.toList legs)) (S.insert here found) routes'
where exits = (cave ! here) ^. tunnels
exits' = S.filter (\t -> (t ^. tunnelTo) `S.notMember` found) exits
legs = S.map (\t -> t & tunnelLength .~ (len + 1)) exits'
routes' = if (len == 0) || ((cave ! here) ^. flowRate) == 0
then routes
else S.insert tunnel routes
```

The main wrinkle with this representation is that often, either the person or the elephant will be in transit between rooms while the other has arrived at their destination. I record that by storing, in the `DoubleSearchState`

, the time that each agent is available. If we've not reached that time yet, the agent isn't updated in the new search space.

There are a bunch of additional changes made throughout the code to use the new description of the `Cave`

: I won't bore you with all of them.

### Performance

What does this do to the performance of the program?

Program | Wall time | Memory used (kb) |
---|---|---|

Original | 2:18.27 | 150,144 |

Compressed | 0:53.46 | 1,371,700 |

This factor of 2⅔ speedup is even more impressive when I say that the "compressed cave" version of the program doesn't use the beam search restriction. That means the agenda in the search isn't limited to just 5,000 entries, so the compressed version is doing a lot more exploration in the much shorter time.

## Separating search

A sub-one-minute time is good, but much worse than the few seconds of time that were reported by many other people. I had a look around at their solutions to find their secret!

The key idea here is that the two agents moving around the cave (the person and the elephant) have identical behaviours but, in the final solution, will open different valves. Therefore, if I can find the best few solutions to the overall problem, I should be able to pick two of them that don't share any opening valves. Those will form the combined solution.

This is helped by the short time limit. In 26 minutes, one agent can open at most 13 valves (1 minute to move, 1 minute to open the valve). In fact, looking at the map, the most one agent can open is eight or nine valves (2 or 3 minutes to move between valves worth opening), generating around 1300 total flow. That implies that the best solution with two agents will have each performing a near-compete solution. But how to generate them?

One way to work is to find all subsets of the cave that contain only some of the valves, and find the best solution using that subset. I can then pick two of these subsets, check they're disjoint, and find the total flow from the pair. But how to find those subsets? There are roughly 15! such subsets, about 10^{12}. That's too many to examine explicitly.

But I have a way of finding near-complete solutions (the existing search process). I can use that to find good partial solutions. Each step of the search algorithm, I can extrapolate the score of this partial solution, assuming no more valves are opened, and push the solution and the score into some sort of cache. I then continue the search until *all* solutions have been found, and I can assemble the final solution form the cache.

What should that cache look like? I only need to know the flow (score) of the cached solution, and the set of valves opened to ensure that the combination of two cached solutions don't open the same valve twice. It may be that the same set of valves, opened in a different order, will give different final values, so I only need to record the *best* score for each set of valves, updating it as better solutions come along.

That suggests using a `Map`

from sets of valves to total flows, and a function that calculates the total flow of a partial solution and updates that `Map`

if necessary.

```
type PartSolutions = M.Map (S.Set RoomID) Int
includeAgendum :: PartSolutions -> Agendum -> CaveContext PartSolutions
includeAgendum results agendum =
do cf <- currentFlow (agendum ^. current)
timeLimit <- asks getTimeLimit
let timeLeft = timeLimit - timeOf (agendum ^. current)
let remainingFlow = cf * timeLeft
let totalFlow = remainingFlow + agendum ^. trailBenefit
let visitedSet = agendum ^. current . openValves
let currentBest = M.findWithDefault 0 visitedSet results
if totalFlow > currentBest
then return (M.insert visitedSet totalFlow results)
else return results
```

That gets used in `allSolutions`

, modified from the `aStar`

search function. `allSolutions`

both uses `includeAgendum`

and keeps searching while there's something on the agenda (changes in the final `if`

statement).

```
allSolutions :: Agenda -> ExploredStates -> PartSolutions -> CaveContext PartSolutions
allSolutions agenda closed foundSolutions
| P.null agenda = return foundSolutions
| otherwise =
do let (_, currentAgendum) = P.findMax agenda
let reached = currentAgendum ^. current
nexts <- candidates currentAgendum closed
let newAgenda = foldl' (\q a -> P.insert (_benefit a) a q) (P.deleteMax agenda) nexts
reachedGoal <- isGoal currentAgendum
let cl = (reached, currentAgendum ^. trailBenefit)
newFoundSolutions <- includeAgendum foundSolutions currentAgendum
if reachedGoal
then allSolutions (P.deleteMax agenda) closed newFoundSolutions
else if (cl `S.member` closed)
then allSolutions (P.deleteMax agenda) closed foundSolutions
else allSolutions newAgenda (S.insert cl closed) newFoundSolutions
```

This also needs some infrastructure to call it.

```
runSearchAll :: Int -> Cave -> PartSolutions
runSearchAll timeLimit cave = result
where result = runReader (searchCaveAll "AA") (TimedCave cave timeLimit sortedRooms)
sortedRooms = sortOn (\r -> Down $ (cave ! r) ^. flowRate ) $ M.keys $ M.filter (\r -> r ^. flowRate > 0) cave
searchCaveAll :: String -> CaveContext PartSolutions
searchCaveAll startRoom =
do agenda <- initAgenda startRoom
allSolutions agenda S.empty M.empty
```

Running that generates 5435 distinct partial solutions to the problem, for one agent opening valves.

The final step is to work out the best combination of partial solutions to make the final solution. I don't do anything clever here, just using a list comprehension to test all combinations, filtering out any pairs that share valves.

```
part2 cave = maximum combinations
where rawSolutions = runSearchAll 26 cave
solutionList = M.toList rawSolutions
combinations = [ (f1 + f2)
| (p, f1) <- solutionList
, (e, f2) <- solutionList
, p < e
, S.disjoint p e
]
```

This is fast because of lazy generation of the `combinations`

list: it's only created as elements are fed into `maximum`

, so no time or memory is wasted generating the full list.

But, for completeness, I wondered about doing the reduction in parallel across several cores. There's no Haskell-standard way of doing map-reduce operations, so I just split the `maximum`

task across chunks of the list.

```
part2 cave = maximum (fmap maximum chunkSolns `using` parList rdeepseq)
where rawSolutions = runSearchAll 26 cave
solutionList = M.toList rawSolutions
combinations = [ fp + fe
| (p, fp) <- solutionList
, (e, fe) <- solutionList
, p < e
, S.disjoint p e
]
chunkSolns = chunksOf 10000 combinations
```

This does do the reduction in parallel (and is faster), but requires the explicit creation of the `combinations`

list, which slows things down.

### Performance

With all those changes done, how long does it take?

Program | Wall time | Memory used (kb) |
---|---|---|

Original | 2:18.27 | 150,144 |

Compressed | 0:53.46 | 1,371,700 |

Combined | 0:10.77 | 149,876 |

Combined, parallel | 0:11.21 | 167,848 |

A solution that takes about ten seconds rather than over two minutes seems like a good improvement to me!

## Code

You can get the code from my locally-hosted Git repo, or from Gitlab.