Python: Trees

Theory: Aggregation 2

Let's practice with another option for data aggregation on file systems. We'll write a function that accepts a directory as input and returns a list of directories of the first level of nesting and the number of files inside each of them, including all subdirectories:

from hexlet import fs

tree = fs.mkdir('/', [
    fs.mkdir('etc', [
        fs.mkdir('apache'),
        fs.mkdir('nginx', [
            fs.mkfile('nginx.conf'),
        ]),
    ]),
    fs.mkdir('consul', [
        fs.mkfile('config.json'),
        fs.mkfile('file.tmp'),
        fs.mkdir('data'),
    ]),
    fs.mkfile('hosts'),
    fs.mkfile('resolve'),
])

print(get_subdirectories_info(tree))
# => [('etc', 1), ('consul', 2)]

We can break this task down into two steps:

  • Counting the number of files inside a directory
  • Calling the file counting function on each of the subdirectories

Let's start by counting the number of files. It is a classic aggregation task:

def get_files_count(node):
    if fs.is_file(node):
        return 1
    children = fs.get_children(node)
    descendant_counts = list(map(get_files_count, children))
    return sum(descendant_counts)

The next step is to extract all the children from the source node and apply a count to each of them:

def get_subdirectories_info(node):
    children = fs.get_children(node)
    # We are only interested in directories
    filtered = filter(fs.is_directory, children)
    # Running the count for each directory
    result = map(
        lambda child: (fs.get_name(child), get_files_count(child)),
        filtered,
    )
    return list(result)

In other words, we addressed the children directly, filtered them, and then mapped them to the necessary array containing names and numbers of files for each directory.