Python: Trees
Theory: Aggregation
Data aggregation is the most essential operation when working with trees. Examples of data aggregation are:
- Calculating the total number of files in the directory or the overall size of all files
- Getting a list of all files
- Finding all files by template
The point in aggregating operations is accumulating the result. Traversing the tree in depth using a recursive process, which we discussed in detail in the previous lesson, is well-suited for this. Using it, we can go through all the tree nodes and collect the result, starting from the lowest level.
Let's look at aggregation using a recursive process by counting the total number of nodes in a tree as an example. Essentially, we want to find out how many files and directories are in our file tree:
There's not much code here, but it's pretty tricky. There are several key points:
- The function checks the node type:
- If it is a file, the function returns the node
- If the node is a directory, the function returns the child nodes, so we call our function again for each child node and repeat the algorithm
- Calls to this function on children return the number of their descendants, so we get a list of the numbers that we need to combine
- At the end, we get the total number of all descendants of the node plus one calculated as the current node itself
Before moving on, have a play around with this code. It's the only way to get to know how it works.