Optimising Haskell, example 3

    Another example of optimising Haskell, this time looking at the Advent of Code 2022 day 16 puzzle and my original solution to it. This post is more about manipulating the problem description and picking better methods, rather than raw optimisation of Haskell code.

    Compressing the cave

    The first thing was to look at the structure of the problem. The task is to find a path though a graph (a set of connected rooms), where some nodes have a score if you visit them, and to maximise that score within some time limit. The full task description has 62 rooms, of which only 15 have such a score; the others are there simply to add a cost to moving from room to room.

    The map of my cave

    That means the first optimisation is to "compress" the cave into a form that only has the rooms with scores, and to find the shortest-path cost between each pair of scoring rooms.

    First, I needed to modify the Cave data structure to hold this new representation. The main changes is that each tunnel between rooms now comes with a tunnelLength. The other thing is that the time elapsed exploring the cave can't be computing simply from the number of rooms visited; instead, it's convenient to store it in the SearchState.

    type RoomID = String
    data Tunnel = Tunnel { _tunnelTo :: RoomID, _tunnelLength :: Int}
        deriving (Eq, Show, Ord)
    makeLenses ''Tunnel
    data Room = Room 
        { _flowRate :: Int
        , _tunnels :: S.Set Tunnel
        } deriving (Eq, Show, Ord)
    makeLenses ''Room
    data SingleSearchState = SingleSearchState
        { _currentRoom :: RoomID
        , _currentTime :: Int
        , _sOpenValves :: S.Set RoomID
        } deriving (Eq, Show, Ord)
    makeLenses ''SingleSearchState
    data DoubleSearchState = DoubleSearchState
        { _personRoom :: RoomID
        , _personTime :: Int
        , _elephantRoom :: RoomID
        , _elephantTime :: Int
        , _dOpenValves :: S.Set RoomID
        } deriving (Eq, Show, Ord)
    makeLenses ''DoubleSearchState

    Compressing the cave is a simple sequence of breath-first searches, starting from each of the interesting rooms. Each search maintains a Set of Tunnels found (called found) (each representing the shortest path to that room), a separate Set of Tunnels that go to interesting rooms (called routes), and a boundary of rooms to be explored. Each explored room adds a bunch of tunnels to the results and moves the boundary. If a room already exists in found, it's not processed again.

    compress :: Cave -> Cave
    compress cave = M.mapWithKey (compressRoom cave) cave
    compressRoom :: Cave -> RoomID -> Room -> Room
    compressRoom cave here room = room & tunnels .~ t'
      where t' = reachableFrom cave [Tunnel here 0] S.empty S.empty
    reachableFrom :: Cave -> [Tunnel] -> S.Set RoomID -> S.Set Tunnel -> S.Set Tunnel
    reachableFrom _ [] _ routes = routes
    reachableFrom cave (tunnel@(Tunnel here len):boundary) found routes
      | here `S.member` found = reachableFrom cave boundary found routes
      | otherwise = reachableFrom cave (boundary ++ (S.toList legs)) (S.insert here found) routes'
      where exits = (cave ! here) ^. tunnels
            exits' = S.filter (\t -> (t ^. tunnelTo) `S.notMember` found) exits
            legs = S.map (\t -> t & tunnelLength .~ (len + 1)) exits'
            routes' = if (len == 0) || ((cave ! here) ^. flowRate) == 0
                      then routes
                      else S.insert tunnel routes

    The main wrinkle with this representation is that often, either the person or the elephant will be in transit between rooms while the other has arrived at their destination. I record that by storing, in the DoubleSearchState, the time that each agent is available. If we've not reached that time yet, the agent isn't updated in the new search space.

    There are a bunch of additional changes made throughout the code to use the new description of the Cave: I won't bore you with all of them.


    What does this do to the performance of the program?

    Program  Wall time   Memory used (kb) 
    Original 2:18.27 150,144
    Compressed 0:53.46 1,371,700

    This factor of 2⅔ speedup is even more impressive when I say that the "compressed cave" version of the program doesn't use the beam search restriction. That means the agenda in the search isn't limited to just 5,000 entries, so the compressed version is doing a lot more exploration in the much shorter time.

    A sub-one-minute time is good, but much worse than the few seconds of time that were reported by many other people. I had a look around at their solutions to find their secret!

    The key idea here is that the two agents moving around the cave (the person and the elephant) have identical behaviours but, in the final solution, will open different valves. Therefore, if I can find the best few solutions to the overall problem, I should be able to pick two of them that don't share any opening valves. Those will form the combined solution.

    This is helped by the short time limit. In 26 minutes, one agent can open at most 13 valves (1 minute to move, 1 minute to open the valve). In fact, looking at the map, the most one agent can open is eight or nine valves (2 or 3 minutes to move between valves worth opening), generating around 1300 total flow. That implies that the best solution with two agents will have each performing a near-compete solution. But how to generate them?

    One way to work is to find all subsets of the cave that contain only some of the valves, and find the best solution using that subset. I can then pick two of these subsets, check they're disjoint, and find the total flow from the pair. But how to find those subsets? There are roughly 15! such subsets, about 1012. That's too many to examine explicitly.

    But I have a way of finding near-complete solutions (the existing search process). I can use that to find good partial solutions. Each step of the search algorithm, I can extrapolate the score of this partial solution, assuming no more valves are opened, and push the solution and the score into some sort of cache. I then continue the search until all solutions have been found, and I can assemble the final solution form the cache.

    What should that cache look like? I only need to know the flow (score) of the cached solution, and the set of valves opened to ensure that the combination of two cached solutions don't open the same valve twice. It may be that the same set of valves, opened in a different order, will give different final values, so I only need to record the best score for each set of valves, updating it as better solutions come along.

    That suggests using a Map from sets of valves to total flows, and a function that calculates the total flow of a partial solution and updates that Map if necessary.

    type PartSolutions = M.Map (S.Set RoomID) Int
    includeAgendum :: PartSolutions -> Agendum -> CaveContext PartSolutions
    includeAgendum results agendum =
      do cf <- currentFlow (agendum ^. current)
         timeLimit <- asks getTimeLimit
         let timeLeft = timeLimit - timeOf (agendum ^. current)
         let remainingFlow = cf * timeLeft
         let totalFlow = remainingFlow + agendum ^. trailBenefit
         let visitedSet = agendum ^. current . openValves
         let currentBest = M.findWithDefault 0 visitedSet results
         if totalFlow > currentBest
         then return (M.insert visitedSet totalFlow results)
         else return results

    That gets used in allSolutions,  modified from the aStar search function. allSolutions both uses includeAgendum and keeps searching while there's something on the agenda (changes in the final if statement).

    allSolutions ::  Agenda -> ExploredStates -> PartSolutions -> CaveContext PartSolutions
    allSolutions agenda closed foundSolutions
        | P.null agenda = return foundSolutions
        | otherwise = 
            do  let (_, currentAgendum) = P.findMax agenda
                let reached = currentAgendum ^. current
                nexts <- candidates currentAgendum closed
                let newAgenda = foldl' (\q a -> P.insert (_benefit a) a q) (P.deleteMax agenda) nexts
                reachedGoal <- isGoal currentAgendum
                let cl = (reached, currentAgendum ^. trailBenefit)
                newFoundSolutions <- includeAgendum foundSolutions currentAgendum
                if reachedGoal
                then allSolutions (P.deleteMax agenda) closed newFoundSolutions 
                else if (cl `S.member` closed)
                     then allSolutions (P.deleteMax agenda) closed foundSolutions 
                     else allSolutions newAgenda (S.insert cl closed) newFoundSolutions 

    This also needs some infrastructure to call it.

    runSearchAll :: Int -> Cave -> PartSolutions
    runSearchAll timeLimit cave = result
        where result = runReader (searchCaveAll "AA") (TimedCave cave timeLimit sortedRooms)
              sortedRooms = sortOn (\r -> Down $ (cave ! r) ^. flowRate ) $ M.keys $ M.filter (\r -> r ^. flowRate > 0) cave
    searchCaveAll :: String -> CaveContext PartSolutions
    searchCaveAll startRoom = 
        do agenda <- initAgenda startRoom
           allSolutions agenda S.empty M.empty

    Running that generates 5435 distinct partial solutions to the problem, for one agent opening valves.

    The final step is to work out the best combination of partial solutions to make the final solution. I don't do anything clever here, just using a list comprehension to test all combinations, filtering out any pairs that share valves.

    part2 cave = maximum combinations 
      where rawSolutions = runSearchAll 26 cave
            solutionList = M.toList rawSolutions
            combinations = [ (f1 + f2) 
                           | (p, f1) <- solutionList
                           , (e, f2) <- solutionList
                           , p < e
                           , S.disjoint p e

    This is fast because of lazy generation of the combinations list: it's only created as elements are fed into maximum, so no time or memory is wasted generating the full list.

    Two-agent solution to my problem

    But, for completeness, I wondered about doing the reduction in parallel across several cores. There's no Haskell-standard way of doing map-reduce operations, so I just split the maximum task across chunks of the list.

    part2 cave = maximum (fmap maximum chunkSolns `using` parList rdeepseq)
      where rawSolutions = runSearchAll 26 cave
            solutionList = M.toList rawSolutions
            combinations = [ fp + fe 
                           | (p, fp) <- solutionList
                           , (e, fe) <- solutionList
                           , p < e
                           , S.disjoint p e
            chunkSolns = chunksOf 10000 combinations

    This does do the reduction in parallel (and is faster), but requires the explicit creation of the combinations list, which slows things down.


    With all those changes done, how long does it take?

    Program  Wall time   Memory used (kb) 
    Original 2:18.27 150,144
    Compressed 0:53.46 1,371,700
    Combined 0:10.77 149,876
    Combined, parallel 0:11.21 167,848

    A solution that takes about ten seconds rather than over two minutes seems like a good improvement to me!


    You can get the code from my locally-hosted Git repo, or from Gitlab.

    Neil Smith

    Read more posts by this author.