I didn't expect to be using zippers as early as day 7! But at least I got to learn about the standard Data.Tree
and Data.Tree.Zipper
libraries.
A zipper is a data structure with the notion of a "current element". It's easy to inspect and alter the current element, and to move in relative steps around the structure. For instance, a list may have the focus on a particular element and allow you to move left and right to adjacent positions.
I used this to navigate around the directory tree, using the cd
commands to change the focus.
Data types
First, I defined a type to hold the results of the parsing, with each line parsing into some object. I also needed types to hold the nodes in the trees I was generating (one for the full directory structure, one for just the sizes of directories). I also needed types for the various trees.
data ParsedObject = CD String
| LS
| PDirectory String
| PFile Int String
deriving Show
data Directory = Dir String (M.Map String Int)
deriving (Show, Eq)
data ContainedSize = CSize String Integer
deriving (Show, Eq)
type DTree = Tree Directory
type ZDTree = TreePos Full Directory
type STree = Tree ContainedSize
A Directory
holds the name of the directory and the files it holds, stored as a Map
of file names to sizes.
The Data.Tree
library uses the Node
constructor to hold the details of each node in the tree (a rose tree), where each node also holds a list of subtrees, like this empty tree.
emptyTree :: DTree
emptyTree = Node {rootLabel = (Dir "/" M.empty), subForest = []}
Parsing
There wasn't much to this. Each line was parsed into a separate object, and parsing returned a list of them. The only wrinkle was defining a parser for names, being a sequence of non-space characters.
traceP = lineP `sepBy` endOfLine
lineP = cdP <|> lsP <|> directoryP <|> fileP
cdP = CD <$> ("$ cd " *> nameP)
lsP = LS <$ "$ ls"
fileP = PFile <$> (decimal <* " ") <*> nameP
directoryP = PDirectory <$> ("dir " *> nameP)
nameP = many1 letterP
letterP = satisfy (not . isSpace)
Building the tree
This was where the bulk of the effort went. mkTree
handles the conversion to and from zipped trees and makeTree
does the work of assembling the tree.
mkTree :: [ParsedObject] -> DTree -> DTree
mkTree trace tree = toTree $ root $ makeTree trace $ fromTree tree
makeTree :: [ParsedObject] -> ZDTree -> ZDTree
makeTree trace tree = foldl' processCommand tree trace
ProcessCommand
takes each element from the command trace and updates the tree accordingly.
processCommand :: ZDTree -> ParsedObject -> ZDTree
processCommand tree (CD name)
| name == "/" = root tree
| name == ".." = fromJust $ parent tree
| otherwise = fromJust $ childWithName name tree
processCommand tree LS = tree
processCommand tree (PFile size name) =
modifyLabel (\ (Dir n fs) -> Dir n (M.insert name size fs)) tree
processCommand tree (PDirectory name) =
if (isJust $ childWithName name tree)
then tree
else fromJust $ parent $ insert (Node { rootLabel = (Dir name M.empty)
, subForest = []
}) $ children tree
Processing cd
commands moves up and down the tree, moving into parents or children.
Finding out about a new file means I update the file listing of the current node.
Finding out about a new directory means checking if the directory exists. If it does, leave it. If it doesn't, create a new empty directory node. (This moves the focus to the new node, so I use parent
to return to the current directory.)
This relies on the utility function childWithName
that searches for the directory with a given name, while the library uses finding by position. It starts with the first child and keeps taking the next one until it finds the directory with the right name.
childWithName :: String -> ZDTree -> Maybe ZDTree
childWithName name tree = searchForChild name (firstChild tree)
searchForChild :: String -> Maybe ZDTree -> Maybe ZDTree
searchForChild _name Nothing = Nothing
searchForChild name (Just tree)
| name == labelName = Just tree
| otherwise = searchForChild name (next tree)
where (Dir labelName _) = label tree
Solving the puzzle
Now I have the directory tree, actually solving the puzzle is easy!
containingSizes
finds the sizes of the files directly in each directory. It adds up the file sizes in the Map
then fmap
s the same function to all the subtrees. transitiveSizes
takes those sized directories and adds up the sizes of the subdirectories.
containingSizes :: DTree -> STree
containingSizes (Node {rootLabel = (Dir name files), subForest = sf}) =
(Node {rootLabel = (CSize name sizeHere), subForest = sizedTrees})
where sizeHere = M.foldl (+) 0 $ M.map fromIntegral files
sizedTrees = fmap containingSizes sf
transitiveSizes :: STree -> STree
transitiveSizes (Node {rootLabel = (CSize name sizeHere), subForest = sf}) =
(Node {rootLabel = (CSize name (sizeHere + subSizes)), subForest = sizedTrees })
where sizedTrees = fmap transitiveSizes sf
subSizes = sum $ fmap (extractCSize . rootLabel) sizedTrees
extractCSize, cancelLarge :: ContainedSize -> Integer
extractCSize (CSize _ s) = s
Part 1 throws away the large sizes and adds up the rest.
Part 2 starts by flattening the lovely tree to just a list of nodes, then does some arithmetic to find how much space is needed, then finds the smallest directory larger than that size.
part1, part2 :: STree -> Integer
part1 = foldTree (\x xs -> sum (x:xs)) . fmap cancelLarge
cancelLarge (CSize _ s)
| s <= reportingThreshold = s
| otherwise = 0
part2 tree = spaceFreed
where nodes = fmap extractCSize $ flatten tree
spaceUsed = extractCSize $ rootLabel tree
spaceUnused = spaceAvailable - spaceUsed
spaceToFree = spaceRequired - spaceUnused
viableNodes = filter (>= spaceToFree) nodes
spaceFreed = head $ sort viableNodes
And that's it! Over-engineered, but thorough.
Code
You can get the code from my locally-hosted Git repo, or from Gitlab.