Optimising Haskell, example 4

    This is the final optimisation post for my solutions to the Advent of Code 2022. My review of my solutions showed that day 19 was by far the slowest of my solutions, taking about 20 minutes to find a result. I must be able to do better.

    Most of the speedup came from reframing the problem, but there's some acutal low-level Haskell optimisation at the end!

    Factory, not time

    My original solution was an A* search, stepping forward in time. At each minute of the simulation, I generated a state of what could happen in the next minute. In that case, the A* search approach guaranteed that the first solution would be the optimal one, but the heuristic I was using (assuming you can make a geode-cracking robot each minute after the current time) vastly overestimated the value of early states, leading to was wasn't far off a breadth-first search of the space.

    No better heuristic immediately came to mind, so I instead decided to change how I represented the problem. My first attempt revolved around the time; I would recast the problem about the actions of the robot-making factory.

    At any given time, there are options for what the factory can do. For each type of robot, we can decide to:

    1. If I have enough of that type of robot, do nothing.
    2. If I have enough resources on hand, make that robot now.
    3. If I don't have the resources, but will have them if I wait for a bit, then wait until I have the resources and build the robot.

    (These choices are mutually exclusive for each robot type.)

    Options 2 and 3 collapse to the same thing if I recast it as "wait until I've got enough resources, then build" on the understanding that the wait may be zero minutes.

    There's a global fourth action, which is just to run out the clock and have the existing robots gather what they will. This option will handle the last fragment of time for each run.

    When waiting to build a robot, I can calculate how much time I need to wait. That suggests I should store the current time in the SearchState:

    data SearchState = SearchState
        { _resources :: Collection
        , _robots :: Collection
        , _currentTime :: Int
        } deriving (Eq, Show, Ord)
    makeLenses ''SearchState

    Next is a function that handles the above decision making, for a particular robot at a particular state. Note that this returns a Maybe SearchState, as it could be there's nothing to do for this type of robot. maxRobots is a collection of the maximum quantity needed of each type of robot; recipe is the requirements for this type of robot.

    handleRobot :: SearchState -> Collection -> Int -> Resource -> Collection -> Maybe SearchState
    handleRobot state maxRobots timeLimit robot recipe
      | sufficientRobots robot state maxRobots = Nothing
      | otherwise = buildWhenReady robot state recipe timeLimit

    I then wrote little functions to do each of the steps.

    -- do I already have enough of this robot?
    sufficientRobots :: Resource -> SearchState -> Collection -> Bool
    sufficientRobots robot state maxRobots = 
        (robot `MS.member` maxRobots)
        && 
        ((MS.occur robot (state ^. robots)) >= (MS.occur robot maxRobots))

    The case of delaying for a while until building a robot is a little more convoluted. This case is viable if, for each resource I'm short of, I have at least one robot that can gather that resource. In that case, I calculate the delay needed to get enough of each resource, and the final delay is the largest of them. (If I have the resources, there is no shortfall, so the final delay is zero; hence maximum0.) If waiting will never generate enough resources (because a particular type of robot doesn't exist yet), return Nothing.

    -- assuming I can't build this robot, how long do I have to wait for the current
    -- robots to gather enough to build it?
    buildDelay :: SearchState -> Collection -> Maybe Int
    buildDelay state recipe 
      | all (\r -> MS.member r rbts) (MS.distinctElems shortfall) = Just $ maximum0 $ fmap snd $ MS.toOccurList delay
      | otherwise = Nothing
      where shortfall = recipe `MS.difference` (state ^. resources)
            delay = MS.foldOccur calcOneDelay MS.empty shortfall
            rbts = state ^. robots
            calcOneDelay resource count acc = 
              MS.insertMany resource 
                            (ceiling $ (fromIntegral count) / (fromIntegral $ MS.occur resource rbts))
                            acc
            maximum0 xs = if (null xs) then 0 else maximum xs

    Using that, updating the state for that delay is a little simpler, but still has to handle a delay of Nothing and a delay that would be beyond the allowed time limit. That's easier with a Maybe monad.

    builldWhenReady :: Resource -> SearchState -> Collection -> Int -> Maybe SearchState
    buildWhenReady robot state recipe timeLimit =
      do  waitDelay <- buildDelay state recipe
          delay <- tooLate (state ^. currentTime) (waitDelay + 1) timeLimit
          let gathered = MS.foldOccur (\res n acc -> MS.insertMany res (n * delay) acc) 
                                    MS.empty 
                                    (state ^. robots)
          return (state & robots %~ MS.insert robot
                        & resources %~ (MS.union gathered)
                        & resources %~ ( `MS.difference` recipe )
                        & currentTime %~ (+ delay)
                 )
    
    tooLate :: Int -> Int -> Int -> Maybe Int
    tooLate current delay timeLimit
      | (current + delay) <= timeLimit = Just delay
      | otherwise = Nothing

    The body of buildWhenReady demonstrates how lenses can make this kind of update much easier.

    The per-robot-type processing is called from successors, using catMaybes to retain just the valid successors. It also handles the "wait until the end" case.

    successors :: SearchState -> BlueprintContext (Q.Seq SearchState)
    successors state = 
      do blueprint <- asks getBlueprint
         maxRobots <- asks getMaxRobots
         timeLimit <- asks getTimeLimit
    
         let robotSuccessors = Q.fromList $ catMaybes $ M.elems $ M.mapWithKey (handleRobot state maxRobots timeLimit) blueprint
         
         let timeRemaining = timeLimit - (state ^. currentTime)
         let gathered = MS.foldOccur (\res n acc -> MS.insertMany res (n * timeRemaining) acc) 
                                    MS.empty 
                                    (state ^. robots)
         let delayUntilEnd = (state & currentTime .~ timeLimit
                                    & resources %~ (MS.union gathered)
                             )
         return ( robotSuccessors |> delayUntilEnd )

    With that, finding the overall solution still uses the A* search from before.

    The final change was to use parallel processing properly, and explore the different blueprints in parallel. This speeds up part 1, where I process all 30 blueprints, and has less effect on part 2, where I only process three.

    part1, part2 :: [(Int, Blueprint)] -> Int
    part1 blueprints = sum [n * (MS.occur Geode (r ^. resources)) | (n, r) <- results]
      where results = parMap rdeepseq (scoreBlueprint 24) blueprints
    
    part2 blueprints = product [MS.occur Geode (r ^. resources) | (_, r) <- results]
        where results = parMap rdeepseq (scoreBlueprint 32) $ take 3 blueprints

    Performance

    How well did this work?

    Program  Wall time   Memory used (kb) 
    Original 15:09.53 1,562,024
    Factory-based 0:14.02 1,031,664

    Enormously well. The time went from 15-20 minutes to about 14 seconds. But it can go quicker.

    Actual Haskell optimisation

    Profiling the program showed something interesting. About two-thirds of the runtime was spent in the compare function defined for the SearchState record. This will be part of the Data.Set library, that uses comparison for efficient storage. It's used when checking if the current SearchState is in the closed set. As the SearchState is a couple of Multisets, this takes time.

    I can reduce this by having a hashed representation of the SearchState that's good enough for checking equality, even if it's not so useful in the rest of the program.

    I could manually redefine the Eq and Ord functions for SearchState, but that would require every element of the closed set be rehashed for every membership check; that's likely to get expensive. An alternative is to store only the hashes in the closed set, and hash the new member when I want to check for membership.

    What should the hash be? Integers are fast, so let's use them. I want to hash a Multiset to an integer. That will allow me to hash a SearchState into a 3-tuple of the hash of  the resources and robots fields, with the currentTime.

    I can quickly hash a Multiset by adding the amounts of each element, scaling each one so that different Multisets will give different values. A quick inspection of some traces shows that we rarely get to over 100 of any resource, so let's be safe and use 200 as a scaling constant. I add the Enum and Bounded classes to Resource to simplify the guarantee of using all the resource types.

    data Resource = Ore | Clay | Obsidian | Geode
        deriving (Show, Eq, Ord, Generic, Enum, Bounded)
    
    type Collection = MS.MultiSet Resource
    
    hashCollection :: Collection -> Int
    hashCollection c = 
        let k = 200 
        in sum $ zipWith (\r n -> (MS.occur r c) * n) [minBound .. maxBound] $ iterate (* k) 1
    
    type StateHash = (Int, Int, Int)
    hashSearchState :: SearchState -> StateHash
    hashSearchState s = (hashCollection (s ^. resources), hashCollection (s ^. robots), s ^. currentTime)
    
    type ExploredStates = S.Set StateHash

    All that's left is to call hashSearchState the couple of times I examine the closed set, in aStar and candidates.

    Performance

    This step makes another large improvement.

    Program  Wall time   Memory used (kb) 
    Original 15:09.53 1,562,024
    Factory-based 0:14.02 1,031,664
    Hashing 0:06.77 1,073,516
    Hashing, single-thread 0:13.43 508,552

    The version withed the hashed closed set is twice as quick as before, taking less than seven seconds to complete.

    Out of curiosity, I removed the parallel processing of the blueprints. This only doubled the runtime, but halved the memory consumption. It seems there is overhead in running in parallel!

    Code

    You can get the code from my locally-hosted Git repo, or from Gitlab.

    Neil Smith

    Read more posts by this author.