Iterators — Python: Lists

What is iteration
The for loops and iterators
Generators
Generators and lazy calculations

In the previous lesson, we looked at for loops and the term iteration. We can apply this word to any loop in other languages, but this word has another meaning in Python. Iteration is also an interaction of some object that supports the iteration protocol.

First, let us break down what a protocol is in the context of Python. A protocol is a set of specific actions on an object.

If object A allows you to perform actions on it defined by protocol B, then we say:

Object A implements protocol B
Or object A supports protocol B

In the courses that follow, you will learn that there are many different protocols in Python.

Even many of the language's syntactic constructs work for many different objects similarly precisely because they implement specific protocols.

This way, we can substitute strings and values of other types in the template because these types implement the string-conversion protocol. In Python, we find protocols at every turn.

What is iteration

Iteration is one of the most essential protocols in Python. After all, it allows the for loop to work with collections consistently.

What is this protocol all about? The protocol requires the object to be iterable — that is, to have the special __iter__ method.

If you call the __iter__ method on an iterated object, the method should return a new specific object, the so-called iterator. The iterator must have the __next__ method.

Let us look at a proper example: iterating through a list. Lists are iterable, so this is perfect.

So, let us create a list and an iterator for it:

l = [1, 2, 3, 5, 8, 11]
i = iter(l)
print(i)  # => <list_iterator object at 0x7f517843a240>

We have called the iter function for the list, but this function calls the __iter__ method on the list.

It is done for code reading convenience because it is not great to read names like __foo__. Other functions do something similar, such as the len function.

We call most special methods with similar names inside language constructs.

Now we have an iterator i. Let us try to call the __next__ method on it, both directly and with the more convenient next function:

i.__next__()  # 1
i.__next__()  # 2
next(i)  # 3
next(i)  # 5

As we can see, each time we call the method, it returns another item from the original list. It also remembers the position in the list between calls. In this way, the iterator acts as a cursor in your text editor: if you press the arrow keys, the cursor moves and points to a new location in the text. The only difference is that the iterator is a cursor that can only move in one direction.

But what happens when the list runs out of items? Let us check it out:

next(i)  # 8
next(i)  # 11
next(i)
# Traceback (most recent call last):
#   File "<stdin>", line 1, in <module>
# StopIteration

When the iterator reached the end of the original list, the next call resulted in a StopIteration error. It is not an error because everything ends at some point in this case.

StopIteration is — an exception. We will talk about exceptions later. In the meantime, you need to know that those language tools that work based on the iteration protocol know how to respond to this exception. For example, the for loop goes on without a word.

Now you can imagine how the for loop works. It gets a new iterator from the iterated object. Then it calls the __next__ method on the iterator until we throw a StopIteration exception.

The `for` loops and iterators

What happens if you first get an iterator and then pass it to a for loop? It is possible because for loops are clever. It understands that you can start calling __next__.

Let's write a function that loops through a list until it finds a string longer than five characters:

def search_long_string(source):
    for item in source:
        if len(item) >= 5:
            return item

Now create a list containing several matching strings and run the function for that list a couple of times:

animals = ['cat', 'mole', 'tiger', 'lion', 'camel']
search_long_string(animals)  # 'tiger'
search_long_string(animals)  # 'tiger'

The function returned the string twice because we passed an iterable to it, which means that the for loop created a new iterator each time.

Let's create the iterator ourselves and pass it to the function:

animals = ['cat', 'mole', 'tiger', 'lion', 'camel']
cursor = iter(animals)
search_long_string(cursor)  # 'tiger'
search_long_string(cursor)  # 'camel'
search_long_string(cursor)
search_long_string(cursor)

The iterator remembered the state between function calls, and we found both long words. Subsequent function calls returned None because the iterator reached the end and memorized where it did.

But you can create several iterators for the same list, and each will remember its position. As you work with Python code, you're bound to see interesting applications of the iteration protocol.

Generators

In Python, not only collections are iterable. There are also generators.

Generators do not store elements in them but create them as needed. Let's take the range generator as an example. Here's how it works:

numbers = range(3, 11, 2)
for n in numbers:
    print(n)

# => 3
# => 5
# => 7
# => 9
list(numbers)  # [3, 5, 7, 9]

Here range generates a sequence of numbers from 3 to 10 in steps of 2. We can omit steps and initial values. Counting will be performed from zero and in increments of one in this case.

The for loop usually iterates over numbers. Then we use the list function to get a list. This function takes an iterable object or iterator as its sole argument, whose elements it will put into the newly created list.

The list function accumulates values into a list, while tuple accumulates values into a tuple.

Note that range is a restartable generator. In this case, you can create as many iterators as you want and generate values again for them.

There are also non-restartable generators. These always return the same iterators when we call the __iter__ method. Therefore, you can only go over the values of this kind of generator once.

An example of this kind of generator is enumerate. Let's take another look at it:

l = enumerate("asdf")
list(l)  # [(0, 'a'), (1, 's'), (2, 'd'), (3, 'f')]
list(l)  # []

A second attempt to initialize the object in the l variable does nothing because the generator has already done one pass.

And here's another built-in generator — zip. This generator takes several iterated objects or iterators as input and groups them into tuples element by element:

keys = ["foo", "bar", "baz"]
values = [1, 2, 3, 4]
for k, v in zip(keys, values):
    print(k, "=", v)
# => foo = 1
# => bar = 2
# => baz = 3

z = zip(range(10), "hello", [True, False])
list(z)  # [(0, 'h', True), (1, 'e', False)]
list(z)  # []

The example demonstrates two points:

zip is a non-restartable generator
zip that stops generating tuples as soon as elements in either source run out

Generators and lazy calculations

Most programming languages execute code in the order in which we wrote the code elements:

It executes statements from top to bottom
It computes expressions after their components
It calls functions after calculating their arguments

This model of performance is called eager.

There's also the lazy computation model. In this model, Python performs calculations only when the result is needed.

In any program with different input data, separate calculations may not be necessary. Therefore, the lazy computation model can have certain advantages: if it's not needed, it won't work. In this way, laziness is a kind of optimization.

Python has an eager computation model, so it almost always computes everything simultaneously. However, some aspects of laziness are also present in Python.

Generators are one such aspect. They only work when we need them. There are even entire constructs made up of generators. They are like conveyor belts, assembling component values only one element at a time.

So the composite generator zip(range(100000000), "abc") won't produce all 100 million numbers because the string "abc" is too short. It cannot form that many pairs. But even those pairs won't exist if we do not iterate the result of this expression.

So laziness saves memory when processing large streams of data. We don't need to load all the data. It is enough to load and process it in small chunks.

Additionally, you can learn more about itertools. It is a standard library module with many functions for creating iterators and working with them.

For full access to the course you need a professional subscription.

A professional subscription will give you full access to all Hexlet courses, projects and lifetime access to the theory of lessons learned. You can cancel your subscription at any time.

Get access

130

courses

1000

exercises

2000+

hours of theory

3200

tests