Python: Trees
Theory: Accumulator
In some situations, additional information is needed when traversing the tree, depending on the node location. We cannot obtain this information from the node description because the node doesn't contain it. We should collect this information during the crawl. This information may include the full path to the file or the depth of the current node. Any given node won't know where it is. The file location in the file structure depends on the nodes that lead to a particular file.
In this lesson, we'll look at the accumulator, a parameter that collects the necessary data during tree traversal. It may further complicate code, but it's impossible to perform these tasks without it. We will take this task as an example and try to find all empty directories in our file system. First, we will implement a simple version, complicate it, and introduce an accumulator. Here's an example of the file system:
There are three empty directories in this structure:
- /logs
- /etc/apache
- /etc/consul/data
The code that does it looks like this:
The most unusual thing in this implementation is the flatten() function. Why is it needed? If you leave only map(), then the result may be surprising:
It happens due to the list returned at each nesting level. The output is a list of lists that resembles the structure of the original file tree. To prevent this from happening, you need to correct the code using flatten().
Let us try to complicate the task. We will find all the empty directories with a maximum search depth of 2 levels. In other words, the directories /logs and /etc/apache fit this condition, but /etc/consul/data does not. First, you need to understand where to get the depth. Python calculates tree depth as the number of edges from the root to the desired node. It is easy to calculate visually, but what about the code? We can represent the depth of a particular node as the depth of the previous node plus one.
The next step is to add a variable we passed at each recursive call falling into the directory. This variable, in the case of our task, contains the current depth within itself. It means we add a unit at each level inside each directory. This variable is an accumulator since it accumulates data.
The only problem is that the original function find_empty_dir_paths() has node as one parameter. We cannot pass the depth to all nested directories and files using it.
We will have to introduce an internal function that will be able to throw the accumulator further up the tree:
You can go even further and allow it to specify the maximum depth from outside:
But how to make it to view the whole tree by default? For example, you can take a deliberately large number and make it the default value. This approach will work, but it is more of a workaround.
The correct way to do this is to use infinity as the default value: