Advent of Code 2022 day 7

    I didn't expect to be using zippers as early as day 7! But at least I got to learn about the standard Data.Tree and Data.Tree.Zipper libraries.

    A zipper is a data structure with the notion of a "current element". It's easy to inspect and alter the current element, and to move in relative steps around the structure. For instance, a list may have the focus on a particular element and allow you to move left and right to adjacent positions.

    I used this to navigate around the directory tree, using the cd commands to change the focus.

    Data types

    First, I defined a type to hold the results of the parsing, with each line parsing into some object. I also needed types to hold the nodes in the trees I was generating (one for the full directory structure, one for just the sizes of directories). I also needed types for the various trees.

    data ParsedObject = CD String 
                      | LS 
                      | PDirectory String 
                      | PFile Int String 
                      deriving Show
    
    data Directory = Dir String (M.Map String Int)
      deriving (Show, Eq)
    
    data ContainedSize = CSize String Integer
      deriving (Show, Eq)
    
    type DTree = Tree Directory
    type ZDTree = TreePos Full Directory
    type STree = Tree ContainedSize

    A Directory holds the name of the directory and the files it holds, stored as a Map of file names to sizes.

    The Data.Tree library uses  the Node constructor to hold the details of each node in the tree (a rose tree), where each node also holds a list of subtrees, like this empty tree.

    emptyTree :: DTree
    emptyTree = Node {rootLabel = (Dir "/" M.empty), subForest = []}

    Parsing

    There wasn't much to this. Each line was parsed into a separate object, and parsing returned a list of them. The only wrinkle was defining a parser for names, being a sequence of non-space characters.

    traceP = lineP `sepBy` endOfLine
    
    lineP = cdP <|> lsP <|> directoryP <|> fileP
    cdP = CD <$> ("$ cd " *> nameP)
    lsP = LS <$ "$ ls"
    fileP = PFile <$> (decimal <* " ") <*> nameP
    directoryP = PDirectory <$> ("dir " *> nameP)
    
    nameP = many1 letterP
    letterP = satisfy (not . isSpace)

    Building the tree

    This was where the bulk of the effort went. mkTree handles the conversion to and from zipped trees and makeTree does the work of assembling the tree.

    mkTree :: [ParsedObject] -> DTree -> DTree
    mkTree trace tree = toTree $ root $ makeTree trace $ fromTree tree
    
    makeTree :: [ParsedObject] -> ZDTree -> ZDTree
    makeTree trace tree = foldl' processCommand tree trace

    ProcessCommand takes each element from the command trace and updates the tree accordingly.

    processCommand :: ZDTree -> ParsedObject -> ZDTree
    processCommand tree (CD name)
      | name == "/" = root tree
      | name == ".." = fromJust $ parent tree
      | otherwise = fromJust $ childWithName name tree
    processCommand tree LS = tree
    processCommand tree (PFile size name) = 
      modifyLabel (\ (Dir n fs) -> Dir n (M.insert name size fs)) tree
    processCommand tree (PDirectory name) = 
      if (isJust $ childWithName name tree)
      then tree
      else fromJust $ parent $ insert (Node { rootLabel = (Dir name M.empty)
                                            , subForest = []
                                            }) $ children tree

    Processing cd commands moves up and down the tree, moving into parents or children.

    Finding out about a new file means I update the file listing of the current node.

    Finding out about a new directory means checking if the directory exists. If it does, leave it. If it doesn't, create a new empty directory node. (This moves the focus to the new node, so I use parent to return to the current directory.)

    This relies on the utility function childWithName that searches for the directory with a given name, while the library uses finding by position. It starts with the first child and keeps taking the next one until it finds the directory with the right name.

    childWithName :: String -> ZDTree -> Maybe ZDTree
    childWithName name tree = searchForChild name (firstChild tree)
    
    searchForChild :: String -> Maybe ZDTree -> Maybe ZDTree
    searchForChild _name Nothing = Nothing
    searchForChild name (Just tree)
      | name == labelName = Just tree
      | otherwise = searchForChild name (next tree)
      where (Dir labelName _) = label tree

    Solving the puzzle

    Now I have the directory tree, actually solving the puzzle is easy!

    containingSizes finds the sizes of the files directly in each directory. It adds up the file sizes in the Map then fmaps the same function to all the subtrees. transitiveSizes takes those sized directories and adds up the sizes of the subdirectories.

    containingSizes :: DTree -> STree
    containingSizes (Node {rootLabel = (Dir name files), subForest = sf}) = 
      (Node {rootLabel = (CSize name sizeHere), subForest = sizedTrees})
      where sizeHere = M.foldl (+) 0 $ M.map fromIntegral files
            sizedTrees = fmap containingSizes sf
    
    transitiveSizes :: STree -> STree
    transitiveSizes (Node {rootLabel = (CSize name sizeHere), subForest = sf}) =
      (Node {rootLabel = (CSize name (sizeHere + subSizes)), subForest = sizedTrees })
      where sizedTrees = fmap transitiveSizes sf
            subSizes = sum $ fmap (extractCSize . rootLabel) sizedTrees
    
    extractCSize, cancelLarge :: ContainedSize -> Integer
    extractCSize (CSize _ s) = s

    Part 1 throws away the large sizes and adds up the rest.

    Part 2 starts by flattening the lovely tree to just a list of nodes, then does some arithmetic to find how much space is needed, then finds the smallest directory larger than that size.

    part1, part2 :: STree -> Integer
    part1 = foldTree (\x xs -> sum (x:xs)) . fmap cancelLarge 
    
    cancelLarge (CSize _ s) 
      | s <= reportingThreshold = s
      | otherwise = 0
    
    part2 tree = spaceFreed
      where nodes = fmap extractCSize $ flatten tree
            spaceUsed = extractCSize $ rootLabel tree
            spaceUnused = spaceAvailable - spaceUsed
            spaceToFree = spaceRequired - spaceUnused
            viableNodes = filter (>= spaceToFree) nodes
            spaceFreed = head $ sort viableNodes

    And that's it! Over-engineered, but thorough.

    Code

    You can get the code from my locally-hosted Git repo, or from Gitlab.

    Neil Smith

    Read more posts by this author.