We use cookies (including Google cookies) to personalize ads and analyze traffic. By continuing to use our site, you accept our Privacy Policy.

Delete Duplicate Folders in System

Difficulty: Hard


Problem Description

Due to a bug, there are many duplicate folders in a file system. You are given a 2D array paths, where paths[i] is an array representing an absolute path to the ith folder in the file system. Two folders are identical if they contain the same non-empty set of identical subfolders and underlying subfolder structure. The folders do not need to be at the root level to be identical. If two or more folders are identical, then mark the folders as well as all their subfolders. Once all the identical folders and their subfolders have been marked, the file system will delete all of them. Return the 2D array ans containing the paths of the remaining folders after deleting all the marked folders.


Key Insights

  • Folders can be represented as a tree structure where each folder can have multiple subfolders.
  • Identical folders can occur at different levels in the tree.
  • A hash map can be used to track the structure of each folder and identify duplicates based on their subfolder structure.
  • Depth-first search (DFS) can help traverse the folder structure to build the representation needed for comparison.

Space and Time Complexity

Time Complexity: O(N), where N is the total number of folders. Space Complexity: O(N), for storing the folder structures in the hash map.


Solution

To solve the problem, we will utilize a depth-first search (DFS) approach to traverse the folder structure. We will build a representation of each folder that includes its subfolders. By using a hash map, we can keep track of these representations and identify duplicates. If a folder structure is found to be identical to another, we will mark it and all its subfolders for deletion. Finally, we will return the paths of the folders that remain after the deletion.


Code Solutions

from collections import defaultdict

def deleteDuplicateFolders(paths):
    def dfs(path):
        # Concatenate the path segments to create a string representation of the folder
        folder_key = '/'.join(path)
        if folder_key in folder_map:
            return folder_map[folder_key]
        
        current_folders = []
        for sub in folder_map[folder_key]:
            current_folders.append(dfs(sub))
        
        # Sort to ensure identical structures are recognized
        current_folders.sort()
        folder_hash = tuple(current_folders)
        
        if folder_hash in seen:
            to_delete.add(folder_key)
        else:
            seen.add(folder_hash)
        
        return folder_hash

    folder_map = defaultdict(list)

    # Build the folder map
    for path in paths:
        folder_map['/'.join(path)].append(path)

    seen = set()
    to_delete = set()

    # Start DFS from each folder path
    for path in paths:
        dfs(path)

    # Collect the remaining folders
    result = []
    for path in paths:
        if '/'.join(path) not in to_delete:
            result.append(path)

    return result
← Back to All Questions