In the previous lesson, we looked at for
loops and the term iteration. We can apply this word to any loop in other languages, but this word has another meaning in Python. Iteration is also an interaction of some object that supports the iteration protocol.
First, let us break down what a protocol is in the context of Python. A protocol is a set of specific actions on an object.
If object A allows you to perform actions on it defined by protocol B, then we say:
- Object A implements protocol B
- Or object A supports protocol B
In the courses that follow, you will learn that there are many different protocols in Python.
Even many of the language's syntactic constructs work for many different objects similarly precisely because they implement specific protocols.
This way, we can substitute strings and values of other types in the template because these types implement the string-conversion protocol. In Python, we find protocols at every turn.
What is iteration
Iteration is one of the most essential protocols in Python. After all, it allows the for
loop to work with collections consistently.
What is this protocol all about? The protocol requires the object to be iterable — that is, to have the special __iter__
method.
If you call the __iter__
method on an iterated object, the method should return a new specific object, the so-called iterator. The iterator must have the __next__
method.
Let us look at a proper example: iterating through a list. Lists are iterable, so this is perfect.
So, let us create a list and an iterator for it:
l = [1, 2, 3, 5, 8, 11]
i = iter(l)
print(i) # => <list_iterator object at 0x7f517843a240>
We have called the iter
function for the list, but this function calls the __iter__
method on the list.
It is done for code reading convenience because it is not great to read names like __foo__
. Other functions do something similar, such as the len
function.
We call most special methods with similar names inside language constructs.
Now we have an iterator i
. Let us try to call the __next__
method on it, both directly and with the more convenient next
function:
i.__next__() # 1
i.__next__() # 2
next(i) # 3
next(i) # 5
As we can see, each time we call the method, it returns another item from the original list. It also remembers the position in the list between calls. In this way, the iterator acts as a cursor in your text editor: if you press the arrow keys, the cursor moves and points to a new location in the text. The only difference is that the iterator is a cursor that can only move in one direction.
But what happens when the list runs out of items? Let us check it out:
next(i) # 8
next(i) # 11
next(i)
# Traceback (most recent call last):
# File "<stdin>", line 1, in <module>
# StopIteration
When the iterator reached the end of the original list, the next
call resulted in a StopIteration
error. It is not an error because everything ends at some point in this case.
StopIteration is — an exception. We will talk about exceptions later. In the meantime, you need to know that those language tools that work based on the iteration protocol know how to respond to this exception. For example, the for
loop goes on without a word.
Now you can imagine how the for
loop works. It gets a new iterator from the iterated object. Then it calls the __next__
method on the iterator until we throw a StopIteration
exception.
The for
loops and iterators
What happens if you first get an iterator and then pass it to a for
loop? It is possible because for
loops are clever. It understands that you can start calling __next__
.
Let's write a function that loops through a list until it finds a string longer than five characters:
def search_long_string(source):
for item in source:
if len(item) >= 5:
return item
Now create a list containing several matching strings and run the function for that list a couple of times:
animals = ['cat', 'mole', 'tiger', 'lion', 'camel']
search_long_string(animals) # 'tiger'
search_long_string(animals) # 'tiger'
The function returned the string twice because we passed an iterable to it, which means that the for loop created a new iterator each time.
Let's create the iterator ourselves and pass it to the function:
animals = ['cat', 'mole', 'tiger', 'lion', 'camel']
cursor = iter(animals)
search_long_string(cursor) # 'tiger'
search_long_string(cursor) # 'camel'
search_long_string(cursor)
search_long_string(cursor)
The iterator remembered the state between function calls, and we found both long words. Subsequent function calls returned None
because the iterator reached the end and memorized where it did.
But you can create several iterators for the same list, and each will remember its position. As you work with Python code, you're bound to see interesting applications of the iteration protocol.
Generators
In Python, not only collections are iterable. There are also generators.
Generators do not store elements in them but create them as needed. Let's take the range
generator as an example. Here's how it works:
numbers = range(3, 11, 2)
for n in numbers:
print(n)
# => 3
# => 5
# => 7
# => 9
list(numbers) # [3, 5, 7, 9]
Here range
generates a sequence of numbers from 3
to 10
in steps of 2
. We can omit steps and initial values. Counting will be performed from zero and in increments of one in this case.
The for
loop usually iterates over numbers. Then we use the list
function to get a list. This function takes an iterable object or iterator as its sole argument, whose elements it will put into the newly created list.
The list
function accumulates values into a list, while tuple
accumulates values into a tuple.
Note that range
is a restartable generator. In this case, you can create as many iterators as you want and generate values again for them.
There are also non-restartable generators. These always return the same iterators when we call the __iter__
method. Therefore, you can only go over the values of this kind of generator once.
An example of this kind of generator is enumerate
. Let's take another look at it:
l = enumerate("asdf")
list(l) # [(0, 'a'), (1, 's'), (2, 'd'), (3, 'f')]
list(l) # []
A second attempt to initialize the object in the l
variable does nothing because the generator has already done one pass.
And here's another built-in generator — zip
. This generator takes several iterated objects or iterators as input and groups them into tuples element by element:
keys = ["foo", "bar", "baz"]
values = [1, 2, 3, 4]
for k, v in zip(keys, values):
print(k, "=", v)
# => foo = 1
# => bar = 2
# => baz = 3
z = zip(range(10), "hello", [True, False])
list(z) # [(0, 'h', True), (1, 'e', False)]
list(z) # []
The example demonstrates two points:
zip
is a non-restartable generatorzip
that stops generating tuples as soon as elements in either source run out
Generators and lazy calculations
Most programming languages execute code in the order in which we wrote the code elements:
- It executes statements from top to bottom
- It computes expressions after their components
- It calls functions after calculating their arguments
This model of performance is called eager.
There's also the lazy computation model. In this model, Python performs calculations only when the result is needed.
In any program with different input data, separate calculations may not be necessary. Therefore, the lazy computation model can have certain advantages: if it's not needed, it won't work. In this way, laziness is a kind of optimization.
Python has an eager computation model, so it almost always computes everything simultaneously. However, some aspects of laziness are also present in Python.
Generators are one such aspect. They only work when we need them. There are even entire constructs made up of generators. They are like conveyor belts, assembling component values only one element at a time.
So the composite generator zip(range(100000000), "abc")
won't produce all 100 million numbers because the string "abc"
is too short. It cannot form that many pairs. But even those pairs won't exist if we do not iterate the result of this expression.
So laziness saves memory when processing large streams of data. We don't need to load all the data. It is enough to load and process it in small chunks.
Additionally, you can learn more about itertools. It is a standard library module with many functions for creating iterators and working with them.
Are there any more questions? Ask them in the Discussion section.
The Hexlet support team or other students will answer you.
For full access to the course you need a professional subscription.
A professional subscription will give you full access to all Hexlet courses, projects and lifetime access to the theory of lessons learned. You can cancel your subscription at any time.