Accumulator — Python: Trees

In some situations, additional information is needed when traversing the tree, depending on the node location. We cannot obtain this information from the node description because the node doesn't contain it. We should collect this information during the crawl. This information may include the full path to the file or the depth of the current node. Any given node won't know where it is. The file location in the file structure depends on the nodes that lead to a particular file.

In this lesson, we'll look at the accumulator, a parameter that collects the necessary data during tree traversal. It may further complicate code, but it's impossible to perform these tasks without it. We will take this task as an example and try to find all empty directories in our file system. First, we will implement a simple version, complicate it, and introduce an accumulator. Here's an example of the file system:

from hexlet import fs

tree = fs.mkdir('/', [
    fs.mkdir('etc', [
        fs.mkdir('apache'),
        fs.mkdir('nginx', [
            fs.mkfile('nginx.conf'),
        ]),
        fs.mkdir('consul', [
            fs.mkfile('config.json'),
            fs.mkdir('data'),
        ]),
    ]),
    fs.mkdir('logs'),
    fs.mkfile('hosts'),
])

There are three empty directories in this structure:

/logs
/etc/apache
/etc/consul/data

The code that does it looks like this:

def find_empty_dir_paths(tree):
    name = fs.get_name(tree)
    # Getting the children of the node
    children = fs.get_children(tree)
    # Returning the directory if there are no children
    if len(children) == 0:
        return name
    # Filtering the files because we do not need them
    dir_names = filter(fs.is_directory, children)
    # Looking for empty directories inside the current one
    empty_dir_names = list(map(find_empty_dir_paths, dir_names))
    # Flattening the list
    return fs.flatten(empty_dir_names)


print(find_empty_dir_paths(tree))
# => ['apache', 'data', 'logs']

The most unusual thing in this implementation is the flatten() function. Why is it needed? If you leave only map(), then the result may be surprising:

find_empty_dir_paths(tree)
# [['apache', [], ['data']], 'logs']

It happens due to the list returned at each nesting level. The output is a list of lists that resembles the structure of the original file tree. To prevent this from happening, you need to correct the code using flatten().

Let us try to complicate the task. We will find all the empty directories with a maximum search depth of 2 levels. In other words, the directories /logs and /etc/apache fit this condition, but /etc/consul/data does not. First, you need to understand where to get the depth. Python calculates tree depth as the number of edges from the root to the desired node. It is easy to calculate visually, but what about the code? We can represent the depth of a particular node as the depth of the previous node plus one.

The next step is to add a variable we passed at each recursive call falling into the directory. This variable, in the case of our task, contains the current depth within itself. It means we add a unit at each level inside each directory. This variable is an accumulator since it accumulates data.

The only problem is that the original function find_empty_dir_paths() has node as one parameter. We cannot pass the depth to all nested directories and files using it.

We will have to introduce an internal function that will be able to throw the accumulator further up the tree:

def find_empty_dir_paths(tree):
    # There is an internal function that can transfer the battery
    # The accumulator is `depth`, a variable containing the current depth
    def walk(node, depth):
        name = fs.get_name(node)
        children = fs.get_children(node)
        # If there are no children, we return the directory
        if len(children) == 0:
            return name
        # There is no point in looking any further
        # if it is the second nesting level and the directory is not empty
        if depth == 2:
            # We got an empty list because the outside is flat
            # It reveals blank lists
            return []
        # Leaving only the directories
        dir_paths = filter(fs.is_directory, children)
        # Don't forget to increase the depth
        output = list(map(
            lambda child: walk(child, depth + 1),
            dir_paths,
          ))
        # Straightening the list before returning it
        return fs.flatten(output)

    # We start with a depth of `0`
    return walk(tree, 0)


print(find_empty_dir_paths(tree))
# => ['apache', 'logs']

You can go even further and allow it to specify the maximum depth from outside:

def find_empty_dir_paths(tree, max_depth=2):
    # ...

But how to make it to view the whole tree by default? For example, you can take a deliberately large number and make it the default value. This approach will work, but it is more of a workaround.

The correct way to do this is to use infinity as the default value:

def find_empty_dir_paths(tree, max_depth=math.inf):
    # ...

For full access to the course you need a professional subscription.

A professional subscription will give you full access to all Hexlet courses, projects and lifetime access to the theory of lessons learned. You can cancel your subscription at any time.

Get access

130

courses

1000

exercises

2000+

hours of theory

3200

tests

Accumulator — Python: Trees

For full access to the course you need a professional subscription.

Sign up