Optimising Haskell, example 2

    The day 23 solution takes too long (around seven minutes on my machine) so this post shows how a different representation can make a big difference in performance.

    Faster sets

    I originally decided to represent the set of moving elves as a Set, where each Elf knew its position. The problem is that the standard Set implementation, in the containers package, isn't the fastest in the world. (Internally, it uses tree structures to store the sets.)

    The unordered-containers package has another implementation of Set, called HashSet. This package is based on hastables of elements and is stresses performance of the data structures.

    The good news is that the interface to HashSets is just about the same as Sets, so very little had to change in the code. The main difference was that conversion between a HashSet and a MultiSet now had to go via an intermediate List. You can see the code as advent23/MainUnordered.hs in the repos linked below.

    The bad news is that the performance didn't improve much, shaving only a few seconds off the six-minute runtime.

    Related to the Set is the Map data structure. This allows faster lookup of the elves in a particular situation, but at the cost of a more involved program. Over on Reddit, /u/ephrion/ did a great writeup of this approach. The upshot is that the extra work yielded less benefit than the approach I describe below.

    Another approach is needed.

    Array, not Set

    The representation elves as a Set has the advantage of easy bounds-maintaining: as the elves moved apart, I could just change the elements of the Set of elves and not have to keep track of the range of their positions.

    This was easy to implement, but led to very slow performance. Each update to each elf had to find the elements of the set that were nearby.

    A better representation of the elves would be an Array, which gives fast, constant-time access to each position in the grid. The downside is that I'd have to check and possibly change the bounds of the array each generation.

    Array representation and interface

    As I was making lots of updates to the array of elves each simulation round, I thought that using mutable arrays would be the best bet. That should avoid lots of copying of immutable arrays, even if the compiler was able to optimise most of it away.

    The (mutable) array interface is split across a few libraries: Data.Array.MArray is the immutable array interface, and Data.Array.ST wraps them in an ST monad, which itself is provided by Control.Monad.ST. The point of the ST monad is that it keeps all the impure mutating code behind a fence, where it can't infect the pure functional code.

    While I was updating the elf positions, I decided to also track how many elves were proposing to move to a particular space. That would make the identification of clashes easier.

    Data structures

    That led to these new data structures. The Population is an array, indexed by position. The cell holds Nothing if it is vacant, and Just Elf if it is occupied.

    The Elf type becomes a newtype, wrapping a Position. This stores where the Elf is moving to in the next round. Elves that stay still with contain their current position.

    The mutable arrays contain the reference to the ST monad they were in, a dummy type parameter to keep things hygienic.

    newtype Elf = Elf Position
      deriving (Eq, Ord, Show)
    
    type Population = A.Array Position (Maybe Elf)
    
    type MPopulation s = STArray s Position (Maybe Elf)
    type MClashCounts s = STArray s Position Int

    I decided to maintain a border of at least one unoccupied cell around the elves. That meant each elf could check where it was going without having to worry about bounds checking.

    Using arrays

    The top-level change was localised to the simulateOnce function, that did one round of the simulation. This changed from

    simulateOnce =
      do proposeMoves
         removeClashes
         moveElves
         updateDirections
         updateCount

    to

    simulateOnce =
      do  updateGrove
          growGrove
          updateDirections
          updateCount

    where updateGrove handles all the array manipulation and growGrove updates the size of the array after each round.

    Below that level, a lot of things changed!

    updateGrove uses the runSTArray function to handle the conversion of the Array of elves into a mutable one, the updates, then the conversion back to the immutable array.

    updateGrove :: GroveState ()
    updateGrove  = 
      do  grove <- gets currentGrove
          proposalsInf <- gets proposalDirections
          let proposals = take 4 proposalsInf
          let newGrove = 
                runSTArray $ 
                  do mPopulation <- M.thaw grove
                     mCounts <- M.mapArray (const 0) mPopulation
                     proposeMoves mPopulation mCounts proposals
                     removeClashes mPopulation mCounts
                     moveElves mPopulation
                     return mPopulation
          modify' (\g -> g { currentGrove = newGrove})

    It maintains two arrays: one of Maybe Elf, showing where the elves are; and one of Int, showing how many elves will move into a particular position. proposeMoves updates both of these as each elf decides where to move.

    The functions below here aren't that interesting to explain (you can see them yourself). The main consequence is that some of the code has to be split to have explicit intermediate results, if different parts do and don't use the arrays. For instance, the noElves function returns whether there are no elves in a set of locations. The Set-based version looks like this:

    noElves :: Population -> S.Set Position -> Bool
    noElves elves tests = S.null $ S.intersection tests $ S.map _current elves

    but the Array-based version looks like this:

    noElves :: MPopulation s -> [Position] -> ST s Bool
    noElves elves tests = 
      do others <- mapM (M.readArray elves) tests
         return $ all isNothing others

    Performance

    Using Arrays reduces the runtime from seven minutes to seven seconds, but at the cost of now using 18Mb of RAM rather than the original 14Mb.

    Program  Wall time   Memory used (kb) 
    Original 6:10.18 13,408
    Hash sets 6:00.52 14,208
    Arrays 0:06:49 17,604

    The main downside of this approach is the change in programming style it mandates. This program is fundamentally about the mutation of a shared piece of state (the grove of elves). As such, it dictates the shape of the program that does that mutation. The code that solves this problem has a very imperative style, where I've outlined the steps to take and the order to take them, rather than expressing what I want done and allowing results to come together as needed.

    Code

    You can get the code from my locally-hosted Git repo, or from Gitlab.

    Neil Smith

    Read more posts by this author.