Optimising Haskell, example 3

Another example of optimising Haskell, this time looking at the Advent of Code 2022 day 16 puzzle and my original solution to it. This post is more about manipulating the problem description and picking better methods, rather than raw optimisation of Haskell code.

Compressing the cave

The first thing was to look at the structure of the problem. The task is to find a path though a graph (a set of connected rooms), where some nodes have a score if you visit them, and to maximise that score within some time limit. The full task description has 62 rooms, of which only 15 have such a score; the others are there simply to add a cost to moving from room to room.

The map of my cave

That means the first optimisation is to "compress" the cave into a form that only has the rooms with scores, and to find the shortest-path cost between each pair of scoring rooms.

First, I needed to modify the Cave data structure to hold this new representation. The main changes is that each tunnel between rooms now comes with a tunnelLength. The other thing is that the time elapsed exploring the cave can't be computing simply from the number of rooms visited; instead, it's convenient to store it in the SearchState.

type RoomID = String

data Tunnel = Tunnel { _tunnelTo :: RoomID, _tunnelLength :: Int}
    deriving (Eq, Show, Ord)
makeLenses ''Tunnel

data Room = Room 
    { _flowRate :: Int
    , _tunnels :: S.Set Tunnel
    } deriving (Eq, Show, Ord)
makeLenses ''Room

data SingleSearchState = SingleSearchState
    { _currentRoom :: RoomID
    , _currentTime :: Int
    , _sOpenValves :: S.Set RoomID
    } deriving (Eq, Show, Ord)
makeLenses ''SingleSearchState

data DoubleSearchState = DoubleSearchState
    { _personRoom :: RoomID
    , _personTime :: Int
    , _elephantRoom :: RoomID
    , _elephantTime :: Int
    , _dOpenValves :: S.Set RoomID
    } deriving (Eq, Show, Ord)
makeLenses ''DoubleSearchState

Compressing the cave is a simple sequence of breath-first searches, starting from each of the interesting rooms. Each search maintains a Set of Tunnels found (called found) (each representing the shortest path to that room), a separate Set of Tunnels that go to interesting rooms (called routes), and a boundary of rooms to be explored. Each explored room adds a bunch of tunnels to the results and moves the boundary. If a room already exists in found, it's not processed again.

compress :: Cave -> Cave
compress cave = M.mapWithKey (compressRoom cave) cave

compressRoom :: Cave -> RoomID -> Room -> Room
compressRoom cave here room = room & tunnels .~ t'
  where t' = reachableFrom cave [Tunnel here 0] S.empty S.empty

reachableFrom :: Cave -> [Tunnel] -> S.Set RoomID -> S.Set Tunnel -> S.Set Tunnel
reachableFrom _ [] _ routes = routes
reachableFrom cave (tunnel@(Tunnel here len):boundary) found routes
  | here `S.member` found = reachableFrom cave boundary found routes
  | otherwise = reachableFrom cave (boundary ++ (S.toList legs)) (S.insert here found) routes'
  where exits = (cave ! here) ^. tunnels
        exits' = S.filter (\t -> (t ^. tunnelTo) `S.notMember` found) exits
        legs = S.map (\t -> t & tunnelLength .~ (len + 1)) exits'
        routes' = if (len == 0) || ((cave ! here) ^. flowRate) == 0
                  then routes
                  else S.insert tunnel routes

The main wrinkle with this representation is that often, either the person or the elephant will be in transit between rooms while the other has arrived at their destination. I record that by storing, in the DoubleSearchState, the time that each agent is available. If we've not reached that time yet, the agent isn't updated in the new search space.

There are a bunch of additional changes made throughout the code to use the new description of the Cave: I won't bore you with all of them.

Performance

What does this do to the performance of the program?

Program  Wall time   Memory used (kb) 
Original 2:18.27 150,144
Compressed 0:53.46 1,371,700

This factor of 2⅔ speedup is even more impressive when I say that the "compressed cave" version of the program doesn't use the beam search restriction. That means the agenda in the search isn't limited to just 5,000 entries, so the compressed version is doing a lot more exploration in the much shorter time.

A sub-one-minute time is good, but much worse than the few seconds of time that were reported by many other people. I had a look around at their solutions to find their secret!

The key idea here is that the two agents moving around the cave (the person and the elephant) have identical behaviours but, in the final solution, will open different valves. Therefore, if I can find the best few solutions to the overall problem, I should be able to pick two of them that don't share any opening valves. Those will form the combined solution.

This is helped by the short time limit. In 26 minutes, one agent can open at most 13 valves (1 minute to move, 1 minute to open the valve). In fact, looking at the map, the most one agent can open is eight or nine valves (2 or 3 minutes to move between valves worth opening), generating around 1300 total flow. That implies that the best solution with two agents will have each performing a near-compete solution. But how to generate them?

One way to work is to find all subsets of the cave that contain only some of the valves, and find the best solution using that subset. I can then pick two of these subsets, check they're disjoint, and find the total flow from the pair. But how to find those subsets? There are roughly 15! such subsets, about 1012. That's too many to examine explicitly.

But I have a way of finding near-complete solutions (the existing search process). I can use that to find good partial solutions. Each step of the search algorithm, I can extrapolate the score of this partial solution, assuming no more valves are opened, and push the solution and the score into some sort of cache. I then continue the search until all solutions have been found, and I can assemble the final solution form the cache.

What should that cache look like? I only need to know the flow (score) of the cached solution, and the set of valves opened to ensure that the combination of two cached solutions don't open the same valve twice. It may be that the same set of valves, opened in a different order, will give different final values, so I only need to record the best score for each set of valves, updating it as better solutions come along.

That suggests using a Map from sets of valves to total flows, and a function that calculates the total flow of a partial solution and updates that Map if necessary.

type PartSolutions = M.Map (S.Set RoomID) Int

includeAgendum :: PartSolutions -> Agendum -> CaveContext PartSolutions
includeAgendum results agendum =
  do cf <- currentFlow (agendum ^. current)
     timeLimit <- asks getTimeLimit
     let timeLeft = timeLimit - timeOf (agendum ^. current)
     let remainingFlow = cf * timeLeft
     let totalFlow = remainingFlow + agendum ^. trailBenefit
     let visitedSet = agendum ^. current . openValves
     let currentBest = M.findWithDefault 0 visitedSet results
     if totalFlow > currentBest
     then return (M.insert visitedSet totalFlow results)
     else return results

That gets used in allSolutions,  modified from the aStar search function. allSolutions both uses includeAgendum and keeps searching while there's something on the agenda (changes in the final if statement).

allSolutions ::  Agenda -> ExploredStates -> PartSolutions -> CaveContext PartSolutions
allSolutions agenda closed foundSolutions
    | P.null agenda = return foundSolutions
    | otherwise = 
        do  let (_, currentAgendum) = P.findMax agenda
            let reached = currentAgendum ^. current
            nexts <- candidates currentAgendum closed
            let newAgenda = foldl' (\q a -> P.insert (_benefit a) a q) (P.deleteMax agenda) nexts
            reachedGoal <- isGoal currentAgendum
            let cl = (reached, currentAgendum ^. trailBenefit)
            newFoundSolutions <- includeAgendum foundSolutions currentAgendum
            if reachedGoal
            then allSolutions (P.deleteMax agenda) closed newFoundSolutions 
            else if (cl `S.member` closed)
                 then allSolutions (P.deleteMax agenda) closed foundSolutions 
                 else allSolutions newAgenda (S.insert cl closed) newFoundSolutions 

This also needs some infrastructure to call it.

runSearchAll :: Int -> Cave -> PartSolutions
runSearchAll timeLimit cave = result
    where result = runReader (searchCaveAll "AA") (TimedCave cave timeLimit sortedRooms)
          sortedRooms = sortOn (\r -> Down $ (cave ! r) ^. flowRate ) $ M.keys $ M.filter (\r -> r ^. flowRate > 0) cave

searchCaveAll :: String -> CaveContext PartSolutions
searchCaveAll startRoom = 
    do agenda <- initAgenda startRoom
       allSolutions agenda S.empty M.empty

Running that generates 5435 distinct partial solutions to the problem, for one agent opening valves.

The final step is to work out the best combination of partial solutions to make the final solution. I don't do anything clever here, just using a list comprehension to test all combinations, filtering out any pairs that share valves.

part2 cave = maximum combinations 
  where rawSolutions = runSearchAll 26 cave
        solutionList = M.toList rawSolutions
        combinations = [ (f1 + f2) 
                       | (p, f1) <- solutionList
                       , (e, f2) <- solutionList
                       , p < e
                       , S.disjoint p e
                       ]

This is fast because of lazy generation of the combinations list: it's only created as elements are fed into maximum, so no time or memory is wasted generating the full list.

Two-agent solution to my problem

But, for completeness, I wondered about doing the reduction in parallel across several cores. There's no Haskell-standard way of doing map-reduce operations, so I just split the maximum task across chunks of the list.

part2 cave = maximum (fmap maximum chunkSolns `using` parList rdeepseq)
  where rawSolutions = runSearchAll 26 cave
        solutionList = M.toList rawSolutions
        combinations = [ fp + fe 
                       | (p, fp) <- solutionList
                       , (e, fe) <- solutionList
                       , p < e
                       , S.disjoint p e
                       ]
        chunkSolns = chunksOf 10000 combinations

This does do the reduction in parallel (and is faster), but requires the explicit creation of the combinations list, which slows things down.

Performance

With all those changes done, how long does it take?

Program  Wall time   Memory used (kb) 
Original 2:18.27 150,144
Compressed 0:53.46 1,371,700
Combined 0:10.77 149,876
Combined, parallel 0:11.21 167,848

A solution that takes about ten seconds rather than over two minutes seems like a good improvement to me!

Code

You can get the code from my locally-hosted Git repo, or from Gitlab.