In some situations, additional information is needed when traversing the tree, depending on the node location. We cannot obtain this information from the node description because the node doesn't contain it. We should collect this information during the crawl. This information may include the full path to the file or the depth of the current node. Any given node won't know where it is. The file location in the file structure depends on the nodes that lead to a particular file.
In this lesson, we'll look at the accumulator, a parameter that collects the necessary data during tree traversal. It may further complicate code, but it's impossible to perform these tasks without it. We will take this task as an example and try to find all empty directories in our file system. First, we will implement a simple version, complicate it, and introduce an accumulator. Here's an example of the file system:
from hexlet import fs
tree = fs.mkdir('/', [
fs.mkdir('etc', [
fs.mkdir('apache'),
fs.mkdir('nginx', [
fs.mkfile('nginx.conf'),
]),
fs.mkdir('consul', [
fs.mkfile('config.json'),
fs.mkdir('data'),
]),
]),
fs.mkdir('logs'),
fs.mkfile('hosts'),
])
There are three empty directories in this structure:
- /logs
- /etc/apache
- /etc/consul/data
The code that does it looks like this:
def find_empty_dir_paths(tree):
name = fs.get_name(tree)
# Getting the children of the node
children = fs.get_children(tree)
# Returning the directory if there are no children
if len(children) == 0:
return name
# Filtering the files because we do not need them
dir_names = filter(fs.is_directory, children)
# Looking for empty directories inside the current one
empty_dir_names = list(map(find_empty_dir_paths, dir_names))
# Flattening the list
return fs.flatten(empty_dir_names)
print(find_empty_dir_paths(tree))
# => ['apache', 'data', 'logs']
https://repl.it/@hexlet/python-trees-search-find-empty-dirs
The most unusual thing in this implementation is the flatten()
function. Why is it needed? If you leave only map()
, then the result may be surprising:
find_empty_dir_paths(tree)
# [['apache', [], ['data']], 'logs']
It happens due to the list returned at each nesting level. The output is a list of lists that resembles the structure of the original file tree. To prevent this from happening, you need to correct the code using flatten()
.
Let us try to complicate the task. We will find all the empty directories with a maximum search depth of 2 levels. In other words, the directories /logs and /etc/apache fit this condition, but /etc/consul/data does not. First, you need to understand where to get the depth. Python calculates tree depth as the number of edges from the root to the desired node. It is easy to calculate visually, but what about the code? We can represent the depth of a particular node as the depth of the previous node plus one.
The next step is to add a variable we passed at each recursive call falling into the directory. This variable, in the case of our task, contains the current depth within itself. It means we add a unit at each level inside each directory. This variable is an accumulator since it accumulates data.
The only problem is that the original function find_empty_dir_paths()
has node as one parameter. We cannot pass the depth to all nested directories and files using it.
We will have to introduce an internal function that will be able to throw the accumulator further up the tree:
def find_empty_dir_paths(tree):
# There is an internal function that can transfer the battery
# The accumulator is `depth`, a variable containing the current depth
def walk(node, depth):
name = fs.get_name(node)
children = fs.get_children(node)
# If there are no children, we return the directory
if len(children) == 0:
return name
# There is no point in looking any further
# if it is the second nesting level and the directory is not empty
if depth == 2:
# We got an empty list because the outside is flat
# It reveals blank lists
return []
# Leaving only the directories
dir_paths = filter(fs.is_directory, children)
# Don't forget to increase the depth
output = list(map(
lambda child: walk(child, depth + 1),
dir_paths,
))
# Straightening the list before returning it
return fs.flatten(output)
# We start with a depth of `0`
return walk(tree, 0)
print(find_empty_dir_paths(tree))
# => ['apache', 'logs']
https://repl.it/@hexlet/python-trees-accumulator-find-empty-dirs-with-depth
You can go even further and allow it to specify the maximum depth from outside:
def find_empty_dir_paths(tree, max_depth=2):
# ...
But how to make it to view the whole tree by default? For example, you can take a deliberately large number and make it the default value. This approach will work, but it is more of a workaround.
The correct way to do this is to use infinity as the default value:
def find_empty_dir_paths(tree, max_depth=math.inf):
# ...
Are there any more questions? Ask them in the Discussion section.
The Hexlet support team or other students will answer you.
For full access to the course you need a professional subscription.
A professional subscription will give you full access to all Hexlet courses, projects and lifetime access to the theory of lessons learned. You can cancel your subscription at any time.