Advent of Code 2022 day 7

I didn't expect to be using zippers as early as day 7! But at least I got to learn about the standard Data.Tree and Data.Tree.Zipper libraries.

A zipper is a data structure with the notion of a "current element". It's easy to inspect and alter the current element, and to move in relative steps around the structure. For instance, a list may have the focus on a particular element and allow you to move left and right to adjacent positions.

I used this to navigate around the directory tree, using the cd commands to change the focus.

Data types

First, I defined a type to hold the results of the parsing, with each line parsing into some object. I also needed types to hold the nodes in the trees I was generating (one for the full directory structure, one for just the sizes of directories). I also needed types for the various trees.

data ParsedObject = CD String 
                  | LS 
                  | PDirectory String 
                  | PFile Int String 
                  deriving Show

data Directory = Dir String (M.Map String Int)
  deriving (Show, Eq)

data ContainedSize = CSize String Integer
  deriving (Show, Eq)

type DTree = Tree Directory
type ZDTree = TreePos Full Directory
type STree = Tree ContainedSize

A Directory holds the name of the directory and the files it holds, stored as a Map of file names to sizes.

The Data.Tree library uses  the Node constructor to hold the details of each node in the tree (a rose tree), where each node also holds a list of subtrees, like this empty tree.

emptyTree :: DTree
emptyTree = Node {rootLabel = (Dir "/" M.empty), subForest = []}

Parsing

There wasn't much to this. Each line was parsed into a separate object, and parsing returned a list of them. The only wrinkle was defining a parser for names, being a sequence of non-space characters.

traceP = lineP `sepBy` endOfLine

lineP = cdP <|> lsP <|> directoryP <|> fileP
cdP = CD <$> ("$ cd " *> nameP)
lsP = LS <$ "$ ls"
fileP = PFile <$> (decimal <* " ") <*> nameP
directoryP = PDirectory <$> ("dir " *> nameP)

nameP = many1 letterP
letterP = satisfy (not . isSpace)

Building the tree

This was where the bulk of the effort went. mkTree handles the conversion to and from zipped trees and makeTree does the work of assembling the tree.

mkTree :: [ParsedObject] -> DTree -> DTree
mkTree trace tree = toTree $ root $ makeTree trace $ fromTree tree

makeTree :: [ParsedObject] -> ZDTree -> ZDTree
makeTree trace tree = foldl' processCommand tree trace

ProcessCommand takes each element from the command trace and updates the tree accordingly.

processCommand :: ZDTree -> ParsedObject -> ZDTree
processCommand tree (CD name)
  | name == "/" = root tree
  | name == ".." = fromJust $ parent tree
  | otherwise = fromJust $ childWithName name tree
processCommand tree LS = tree
processCommand tree (PFile size name) = 
  modifyLabel (\ (Dir n fs) -> Dir n (M.insert name size fs)) tree
processCommand tree (PDirectory name) = 
  if (isJust $ childWithName name tree)
  then tree
  else fromJust $ parent $ insert (Node { rootLabel = (Dir name M.empty)
                                        , subForest = []
                                        }) $ children tree

Processing cd commands moves up and down the tree, moving into parents or children.

Finding out about a new file means I update the file listing of the current node.

Finding out about a new directory means checking if the directory exists. If it does, leave it. If it doesn't, create a new empty directory node. (This moves the focus to the new node, so I use parent to return to the current directory.)

This relies on the utility function childWithName that searches for the directory with a given name, while the library uses finding by position. It starts with the first child and keeps taking the next one until it finds the directory with the right name.

childWithName :: String -> ZDTree -> Maybe ZDTree
childWithName name tree = searchForChild name (firstChild tree)

searchForChild :: String -> Maybe ZDTree -> Maybe ZDTree
searchForChild _name Nothing = Nothing
searchForChild name (Just tree)
  | name == labelName = Just tree
  | otherwise = searchForChild name (next tree)
  where (Dir labelName _) = label tree

Solving the puzzle

Now I have the directory tree, actually solving the puzzle is easy!

containingSizes finds the sizes of the files directly in each directory. It adds up the file sizes in the Map then fmaps the same function to all the subtrees. transitiveSizes takes those sized directories and adds up the sizes of the subdirectories.

containingSizes :: DTree -> STree
containingSizes (Node {rootLabel = (Dir name files), subForest = sf}) = 
  (Node {rootLabel = (CSize name sizeHere), subForest = sizedTrees})
  where sizeHere = M.foldl (+) 0 $ M.map fromIntegral files
        sizedTrees = fmap containingSizes sf

transitiveSizes :: STree -> STree
transitiveSizes (Node {rootLabel = (CSize name sizeHere), subForest = sf}) =
  (Node {rootLabel = (CSize name (sizeHere + subSizes)), subForest = sizedTrees })
  where sizedTrees = fmap transitiveSizes sf
        subSizes = sum $ fmap (extractCSize . rootLabel) sizedTrees

extractCSize, cancelLarge :: ContainedSize -> Integer
extractCSize (CSize _ s) = s

Part 1 throws away the large sizes and adds up the rest.

Part 2 starts by flattening the lovely tree to just a list of nodes, then does some arithmetic to find how much space is needed, then finds the smallest directory larger than that size.

part1, part2 :: STree -> Integer
part1 = foldTree (\x xs -> sum (x:xs)) . fmap cancelLarge 

cancelLarge (CSize _ s) 
  | s <= reportingThreshold = s
  | otherwise = 0

part2 tree = spaceFreed
  where nodes = fmap extractCSize $ flatten tree
        spaceUsed = extractCSize $ rootLabel tree
        spaceUnused = spaceAvailable - spaceUsed
        spaceToFree = spaceRequired - spaceUnused
        viableNodes = filter (>= spaceToFree) nodes
        spaceFreed = head $ sort viableNodes

And that's it! Over-engineered, but thorough.

Code

You can get the code from my locally-hosted Git repo, or from Gitlab.