December 20, 2020

# Advent of Code 2020 day 11

Cellular automata, and being inefficient

Day 11 was, like day 8, another return to a stalwart of AoC puzzles, but this time it was the cellular automaton. It was also a good example where a bit of optimisation paid off, using GHC's profiler to identify where the problems were.

## Part 1

I stared with a similar approach of to part 2 of 2019 day 24, where I developed the automaton directly. First off, a few data structures. A Position is a (row, column) pair; a Seat is a sum type of states, and the collection of Seats is a Map from position to what's at that position. Note that the Seats only contains the positions with a seats, and not the positions that are floor.

type Position = (Int, Int)
data Seat = Floor | Empty | Occupied deriving (Eq, Ord)
type Seats = M.Map Position Seat

instance Show Seat where
show Floor = "."
show Empty = "L"
show Occupied = "#"


Note the custom Show instance, to make the display of the grid neat.

Reading and showing grids are just cycling over the rows and columns. I don't use showGrid in the code, but it's useful to have for debugging.

readGrid :: String -> (Seats, Position)
readGrid input = (seats, (maxR, maxC))
where seats = M.fromList $concat [ [((r, c), Empty) | (t, c) <- zip row [0..], t == 'L'] | (row, r) <- zip rows [0..] ] rows = lines input maxC = (length$ head rows) - 1
maxR = (length rows) - 1

showGrid seats (maxR, maxC) =
unlines $[ concat [showSeat (r, c) | c <- [0..maxC] ] | r <- [0..maxR]] where showSeat here = show$ M.findWithDefault Floor here seats


And now for the code the drive the automaton. The key part is the rule. It's applied to every seat in the sets and returns the new state of that seat.

ruleA seats here thisSeat
| thisSeat == Empty && nOccs == 0 = Occupied
| thisSeat == Occupied && nOccs >= 4 = Empty
| otherwise = thisSeat
where nOccs = M.size $occupiedNeighbours seats here  It relies on counting the occupied neighbours, and that in turn means finding all the neighbours. neighbours (r, c) = S.delete (r, c)$ S.fromList [ (r + dr, c + dc)
| dr <- [-1, 0, 1], dc <- [-1, 0, 1]]

neighbourhood seats here = M.restrictKeys seats (neighbours here)
occupiedNeighbours seats here = M.filter (== Occupied) $neighbourhood seats here  To run the simulation, I iterate through all the steps and stop when the state doesn't change. I rely on GHC's optimisations to notice that the two occurrences of runSteps rule seats are the same. part1 seats = M.size$ M.filter (== Occupied) $runUntilSame ruleA seats step rule seats = M.mapWithKey (rule seats) seats runSteps rule seats = iterate (step rule) seats seatDifferences rule seats = zip (runSteps rule seats) (tail$ runSteps rule seats)

runUntilSame rule seats = fst $head$ dropWhile (uncurry (/=)) $seatDifferences rule seats  ## Part 2 This followed the same logic as part 1, but with a different rule and a different definition of the neighbourhood: a chair is a neighbour of this one if it's the closest in a given direction, not necessarily adjacent. One way to do that (for one direction) is to generate all the locations in that direction outwards from the starting point, and stop either when you find a seat or the edge of the grid. But that requires passing the grid size into the lowest-level neighbour-finding routines, and I couldn't be bothered to do that. Instead, I wrote a predicate onSightLine that defined whether a seat was on a particular sight line from a location: data Direction = Up | UpRight | Right | DownRight | Down | DownLeft | Left | UpLeft deriving (Eq, Ord, Show, Enum) onSightLine :: Position -> Direction -> Position -> Bool onSightLine (r0, c0) Down (r, c) = (c0 == c) && (r > r0) onSightLine (r0, c0) Up (r, c) = (c0 == c) && (r < r0) onSightLine (r0, c0) Right (r, c) = (r0 == r) && (c > c0) onSightLine (r0, c0) Left (r, c) = (r0 == r) && (c < c0) onSightLine (r0, c0) DownRight (r, c) = ((r - r0) > 0) && ((r - r0) == (c - c0)) onSightLine (r0, c0) UpLeft (r, c) = ((r - r0) < 0) && ((r - r0) == (c - c0)) onSightLine (r0, c0) DownLeft (r, c) = ((r - r0) > 0) && ((r - r0) == (c0 - c)) onSightLine (r0, c0) UpRight (r, c) = ((r - r0) < 0) && ((r - r0) == (c0 - c))  I could then filter all the seats known by whether they were in a particular direction and use a Manhattan distance metric to find the closest one. manhattan (r1, c1) (r2, c2) = abs (r1 - r2) + abs (c1 - c2) closestInDirection seats here direction = take 1 sortedSeats where seatsInDirection = filter (onSightLine here direction)$ M.keys seats
sortedSeats = sortOn (manhattan here) seatsInDirection

closestInSight :: Seats -> Position -> (S.Set Position)
closestInSight seats here =
S.fromList $concatMap (closestInDirection seats here) [d | d <- [Up .. UpLeft]]  I could then filter those seats by whether they were occupied and hence count them and update the rule. ruleB seats here thisSeat | thisSeat == Empty && nOccs == 0 = Occupied | thisSeat == Occupied && nOccs >= 5 = Empty | otherwise = thisSeat where nOccs = M.size$ occupiedInSight seats here

occupiedInSight :: Seats -> Position -> Seats
occupiedInSight seats here = M.filter (== Occupied)
$M.restrictKeys seats$ closestInSight seats here


It worked, but performance was terrible: it took over half an hour to find a solution.

## Profiling and optimisation

Running the code with profiling immediately highlighted the problem:

COST CENTRE                         MODULE SRC                                  %time %alloc



The problem was in the creation of seatsInDirection in the closestInDirection function. This took 95.7% of the runtime. This was a clear target for optimisation.

The neighbourhood of each seat was the same throughout the simulation. Rather than recalculating it for every cell in every step of the simulation, I could calculate the neighbourhoods at the start of the run, and just look them up when needed. But that would require passing in yet another parameter to the simulation functions, which would get messy, especually as the parameter is read-only.

Time for a Reader monad to hid the plumbing. And as I'm passing in the neighbourhoods, I might as well pass in the other unchanging part of each simulation, the rule.

type Neighbourhood = M.Map Position (S.Set Position)
type Rule = Seats -> Neighbourhood -> Position -> Seat -> Seat
type CachedSeats a = Reader (Neighbourhood, Rule) a


That allowed me to keep everything the same for the two parts, just creating a different environment for runSteps to work on.

part1 seats = M.size $M.filter (== Occupied) stableSeats where cachedNeighbours = allNeighbourhoods seats env = (cachedNeighbours, ruleA) stableSeats = snd$ runReader (runSteps seats) env

part2 seats = M.size $M.filter (== Occupied) stableSeats where cachedNeighbours = allSightNeighbourhoods seats env = (cachedNeighbours, ruleB) stableSeats = snd$ runReader (runSteps seats) env


Doing the simulation also became easier, using iterateUntilM from Control.Monad.Loops.

runSteps :: Seats -> CachedSeats (Seats, Seats)
runSteps seats = iterateUntilM (uncurry (==)) seatChanges (M.empty, seats)

seatChanges :: (Seats, Seats) -> CachedSeats (Seats, Seats)
seatChanges (_, seats0) =
do seats <- step seats0
return (seats0, seats)

step :: Seats -> CachedSeats Seats
step seats =
return \$ M.mapWithKey (rule seats nbrs) seats


That dropped the time from 30 minutes to 17 seconds. Not brilliant, but must better than it was.

Profiling the new version still showed that seatsInDirection was expensive to calculate, taking 63% of the runtime. But at least I was only doing it once!

COST CENTRE                         MODULE    SRC                       %time %alloc