Register to get access to free programming courses with interactive exercises

Aggregation Python: Trees

Data aggregation is the most essential operation when working with trees. Examples of data aggregation are:

  • Calculating the total number of files in the directory or the overall size of all files
  • Getting a list of all files
  • Finding all files by template

The point in aggregating operations is accumulating the result. Traversing the tree in depth using a recursive process, which we discussed in detail in the previous lesson, is well-suited for this. Using it, we can go through all the tree nodes and collect the result, starting from the lowest level.

Let's look at aggregation using a recursive process by counting the total number of nodes in a tree as an example. Essentially, we want to find out how many files and directories are in our file tree:

from hexlet import fs

tree = fs.mkdir('/', [
    fs.mkdir('etc', [
        fs.mkfile('bashrc'),
        fs.mkfile('consul.cfg'),
    ]),
    fs.mkfile('hexletrc'),
    fs.mkdir('bin', [
        fs.mkfile('ls'),
        fs.mkfile('cat'),
    ]),
])

# Getting to the tree bottom using a recursive process in our implementation
def get_nodes_count(node):
    if fs.is_file(node):
        # Returning `1` to account for the current file
        return 1
    # If the node is a directory, we get its children
    children = fs.get_children(node)
    # Here is the hardest part
    # Counting the descendants for each child by recursively calling our function
    descendant_counts = list(map(get_nodes_count, children))
    # Returning current directory `1` plus overall number of descendants
    return 1 + sum(descendant_counts)

get_nodes_count(tree)
# 8

https://repl.it/@hexlet/python-trees-aggregation-get-nodes-count

There's not much code here, but it's pretty tricky. There are several key points:

  1. The function checks the node type:
    • If it is a file, the function returns the node
    • If the node is a directory, the function returns the child nodes, so we call our function again for each child node and repeat the algorithm
  2. Calls to this function on children return the number of their descendants, so we get a list of the numbers that we need to combine
  3. At the end, we get the total number of all descendants of the node plus one calculated as the current node itself

Before moving on, have a play around with this code. It's the only way to get to know how it works.


Are there any more questions? Ask them in the Discussion section.

The Hexlet support team or other students will answer you.

About Hexlet learning process

For full access to the course you need a professional subscription.

A professional subscription will give you full access to all Hexlet courses, projects and lifetime access to the theory of lessons learned. You can cancel your subscription at any time.

Get access
130
courses
1000
exercises
2000+
hours of theory
3200
tests

Sign up

Programming courses for beginners and experienced developers. Start training for free

  • 130 courses, 2000+ hours of theory
  • 1000 practical tasks in a browser
  • 360 000 students
By sending this form, you agree to our Personal Policy and Service Conditions

Our graduates work in companies:

Bookmate
Health Samurai
Dualboot
ABBYY
Suggested learning programs
profession
new
Developing web applications with Django
10 months
from scratch
under development
Start at any time

Use Hexlet to the fullest extent!

  • Ask questions about the lesson
  • Test your knowledge in quizzes
  • Practice in your browser
  • Track your progress

Sign up or sign in

By sending this form, you agree to our Personal Policy and Service Conditions
Toto Image

Ask questions if you want to discuss a theory or an exercise. Hexlet Support Team and experienced community members can help find answers and solve a problem.