Advent of Code 2024 day 9

    Day 9 was the first challenge that needed a real change of representation from part 1 to part 2.

    Reading the input is just opening the file and converting each character to a number.

    let diskMap = fmap digitToInt text

    That's the same for both parts.

    Part 1

    This uses a block-based representation of the disk. I keep track of the files and the free spaces. The free space is a Set of free block numbers; the files are a Map from block number to the file ID in that block.

    type Disk = M.IntMap Int
    type Free = S.IntSet

    First, I expand the disk map into the initial block representation. I walk over the elements in the disk map, building up the disk and free as I go: that's a fold over the disk map.

    Each step in the fold needs to know a few things:

    • is the current element of the map a file (or free space)?
    • what is the current block?
    • what is the current file ID?
    • what's the disk and free found so far?

    That means I pass a 5-tuple around to keep track of these things. There's a lot to keep track of, but the expandMapItem function is mostly plumbing.

    expand :: [Int] -> (Disk, Free)
    expand diskMap = (disk, free)
      where (_, _, _, disk, free) = foldl' expandMapItem (True, 0, 0, M.empty, S.empty) diskMap
    
    expandMapItem :: (Bool, Int, Int, Disk, Free) -> Int -> (Bool, Int, Int, Disk, Free)
    expandMapItem (True, pos, fileID, disk, free) size =
      (False, pos + size, fileID + 1, disk', free)
      where
        fileExtent = take size $ zip [pos..] (repeat fileID)
        file = M.fromList fileExtent
        disk' = M.union disk file
    expandMapItem (False, pos, fileID, disk, free) size =
      (True, pos + size, fileID, disk, free')
      where
        gap = S.fromList $ take size [pos..]
        free' = S.union free gap
     

    For my own peace of mind, I have a little function that shows the layout of the disk. I can compare this with the examples in the problem description.

    showDiskFree :: Disk -> Free -> String
    showDiskFree disk free = showDisk disk ++ "\n" ++ showFree free
    
    showDisk :: Disk -> String
    showDisk disk = [showBlock i | i <- [0..pMax]]
      where (pMax, _) = M.findMax disk
            showBlock i = maybe '.' intToDigit $ M.lookup i disk
    
    showFree :: Free -> String  
    showFree free = [if S.member i free then '+' else '.' | i <- [0..pMax]]
      where pMax = S.findMax free

    Once I've got the disk map in the block-based form I want, it's time to shuffle blocks!

    First, I need to work out when to stop. The disk is packed when the highest-location file block is before the lowest-location free block.

    ispackedBlock :: (Disk, Free) -> Bool
    ispackedBlock (disk, free) = dMax < fMin
      where (dMax, _) = M.findMax disk
            fMin = S.findMin free

    I pack one block (the highest location) by finding its location, finding the location of the smallest free block, and swapping them.

    packBlocksStep :: (Disk, Free) -> (Disk, Free)
    packBlocksStep (disk, free) 
      | ispackedBlock (disk, free) = (disk, free)
      | otherwise = (M.insert to fID disk1, S.insert from free1)
      where ((from, fID), disk1) = M.deleteFindMax disk
            (to, free1) = S.deleteFindMin free

    I pack all the blocks by calling iterate packBlockStep (disk, free), pulling out the first arrangment that isPackedBlock.

    packBlocks :: (Disk, Free) -> (Disk, Free)
    packBlocks (disk, free) = head $ dropWhile (not . ispackedBlock) $ iterate packBlocksStep (disk, free)

    Calculating the checksum follows the definition in the task.

    checksum :: Disk -> Int
    checksum disk = sum $ fmap (uncurry (*)) $ M.toAscList disk

    Part 2

    Part 2 operates on whole files rather than blocks, so I need a representation at that level. I settled in having a list of Region, with each Region being either Free or Used.

    data Region = Free Int -- size
                | Used Int Int -- size, fileID
                deriving (Show, Eq)
    
    type RDisk = [Region] -- key is start of region

    I define a couple of "getter" utility functions.

    fileID :: Region -> Int
    fileID (Used _ fID) = fID
    fileID _ = -1
    
    freeSize :: Region -> Int
    freeSize (Free size) = size
    freeSize _ = 0

    Converting the disk map to the region-based form follows a similar pattern to the previous version of expand.

    expand :: [Int] -> RDisk
    expand diskMap = reverse disk
      where (_, _, _, disk) = foldl' expandRegion (True, 0, 0, []) diskMap
    
    expandRegion :: (Bool, Int, Int, RDisk) -> Int -> (Bool, Int, Int, RDisk)
    expandRegion (True, pos, fID, disk) size =
      (False, pos + size, fID + 1, (Used size fID) : disk)
    expandRegion (False, pos, fID, disk) 0 =
      (True, pos, fID, disk)
    expandRegion (False, pos, fID, disk) size =
      (True, pos + size, fID, (Free size) : disk)

    Note that I don't create zero-sized free spaces.

    I also need the same block-based representation for part 1, and for calculating the checksum. That's handled by toBlocks.

    toBlocks :: RDisk -> (Disk, Free)
    toBlocks rdisk = (disk, free)
      where (_, disk, free) = foldl' toBlock (0, M.empty, S.empty) rdisk
    
    toBlock :: (Int, Disk, Free) -> Region -> (Int, Disk, Free)
    toBlock (pos, disk, free) (Free size) = (pos + size, disk, free')
      where
        gap = S.fromList $ take size [pos..]
        free' = S.union free gap
    toBlock (pos, disk, free) (Used size fileID) = (pos + size, disk', free)
      where
        fileExtent = take size $ zip [pos..] (repeat fileID)
        file = M.fromList fileExtent
        disk' = M.union disk file    

    Now I've got the region-based representation, time to pack files. Following the puzzle definition, I work in decreasing order of file ID, making one attempt to pack each file.

    packFiles :: RDisk -> RDisk
    packFiles disk = packBelow maxID disk
      where maxID = maximum $ fmap fileID disk
    
    packBelow :: Int -> RDisk -> RDisk
    packBelow _ [] = []
    packBelow 0 disk = disk
    packBelow fid disk = packBelow (fid - 1) disk'
      where disk' = tidy $ packFile fid disk

    packFile does the work. I need to identify two regions: the file that's moving and the free space it's moving into. Once I've got those two, I can split the list of regions into three parts: the chunk before the free space, the chunk between the free space and the file, and the chunk after the file.

    From the puzzle description, you can see the chunks in the diagram below.

    0099.111...2...333.44.5555.6666.777.8888..
           |^^^|                   |^^^|
     prefix            middle            suffix

    Diagram showing how the regions are split

    I use span to split the list into the prefix-and-middle and suffix. findFree finds the gap to put the file in, if it exists. Once I've got all the bits, I move the file and add the new Free chunks as needed.

    packFile :: Int -> RDisk -> RDisk
    packFile fid disk
      | isNothing gap = disk
      | otherwise = prefix ++ [Used fSize fid, Free (gapSize - fSize)] ++ mid ++ [Free fSize] ++ suffix
      where
        (prefixMid, ((Used fSize _) : suffix)) = span ((/= fid) . fileID) disk
        gap = findFree fSize prefixMid
        (prefix, Free gapSize, mid) = fromJust gap
    
    findFree :: Int -> RDisk -> Maybe (RDisk, Region, RDisk)
    findFree size disk
      | null suffix = Nothing
      | otherwise = Just (prefix, head suffix, tail suffix)
      where (prefix, suffix) = break ((>= size) . freeSize) disk

    The final utility is tidy, which removes zero-sized Free regions, and combines adjacent Free regions.

    tidy :: RDisk -> RDisk
    tidy = foldr tidyRegion []
    
    tidyRegion :: Region -> RDisk -> RDisk
    tidyRegion (Free 0) rdisk = rdisk
    tidyRegion (Free size) ((Free size1) : rdisk) = tidyRegion (Free (size + size1)) rdisk
    tidyRegion region rdisk = region : rdisk

    This is quite slow, but I'm doing a lot of list copying with all the splitting and joining. I'm sure it would be much faster using a Sequence, or even using another Map-based representation. But it's not so slow that I can be bothered to make the change.

    Code

    You can get the code from my locally-hosted Git repo, or from Codeberg.

    Neil Smith

    Read more posts by this author.