Data aggregation is the most essential operation when working with trees. Examples of data aggregation are:
- Calculating the total number of files in the directory or the overall size of all files
- Getting a list of all files
- Finding all files by template
The point in aggregating operations is accumulating the result. Traversing the tree in depth using a recursive process, which we discussed in detail in the previous lesson, is well-suited for this. Using it, we can go through all the tree nodes and collect the result, starting from the lowest level.
Let's look at aggregation using a recursive process by counting the total number of nodes in a tree as an example. Essentially, we want to find out how many files and directories are in our file tree:
from hexlet import fs
tree = fs.mkdir('/', [
fs.mkdir('etc', [
fs.mkfile('bashrc'),
fs.mkfile('consul.cfg'),
]),
fs.mkfile('hexletrc'),
fs.mkdir('bin', [
fs.mkfile('ls'),
fs.mkfile('cat'),
]),
])
# Getting to the tree bottom using a recursive process in our implementation
def get_nodes_count(node):
if fs.is_file(node):
# Returning `1` to account for the current file
return 1
# If the node is a directory, we get its children
children = fs.get_children(node)
# Here is the hardest part
# Counting the descendants for each child by recursively calling our function
descendant_counts = list(map(get_nodes_count, children))
# Returning current directory `1` plus overall number of descendants
return 1 + sum(descendant_counts)
get_nodes_count(tree)
# 8
There's not much code here, but it's pretty tricky. There are several key points:
- The function checks the node type:
- If it is a file, the function returns the node
- If the node is a directory, the function returns the child nodes, so we call our function again for each child node and repeat the algorithm
- Calls to this function on children return the number of their descendants, so we get a list of the numbers that we need to combine
- At the end, we get the total number of all descendants of the node plus one calculated as the current node itself
Before moving on, have a play around with this code. It's the only way to get to know how it works.
For full access to the course you need a professional subscription.
A professional subscription will give you full access to all Hexlet courses, projects and lifetime access to the theory of lessons learned. You can cancel your subscription at any time.