Let's practice with another option for data aggregation on file systems. We'll write a function that accepts a directory as input and returns a list of directories of the first level of nesting and the number of files inside each of them, including all subdirectories:
from hexlet import fs
tree = fs.mkdir('/', [
fs.mkdir('etc', [
fs.mkdir('apache'),
fs.mkdir('nginx', [
fs.mkfile('nginx.conf'),
]),
]),
fs.mkdir('consul', [
fs.mkfile('config.json'),
fs.mkfile('file.tmp'),
fs.mkdir('data'),
]),
fs.mkfile('hosts'),
fs.mkfile('resolve'),
])
print(get_subdirectories_info(tree))
# => [('etc', 1), ('consul', 2)]
We can break this task down into two steps:
- Counting the number of files inside a directory
- Calling the file counting function on each of the subdirectories
Let's start by counting the number of files. It is a classic aggregation task:
def get_files_count(node):
if fs.is_file(node):
return 1
children = fs.get_children(node)
descendant_counts = list(map(get_files_count, children))
return sum(descendant_counts)
The next step is to extract all the children from the source node and apply a count to each of them:
def get_subdirectories_info(node):
children = fs.get_children(node)
# We are only interested in directories
filtered = filter(fs.is_directory, children)
# Running the count for each directory
result = map(
lambda child: (fs.get_name(child), get_files_count(child)),
filtered,
)
return list(result)
In other words, we addressed the children directly, filtered them, and then mapped them to the necessary array containing names and numbers of files for each directory.
For full access to the course you need a professional subscription.
A professional subscription will give you full access to all Hexlet courses, projects and lifetime access to the theory of lessons learned. You can cancel your subscription at any time.