Data preparation — Python: Automated testing

Most tests that test a given functionality are very similar, especially in initial data preparation. In the last lesson, each test started with the line: stack = []. It isn't duplication yet, but it's a step in a dangerous direction.

This lesson will learn how to prevent duplication when preparing identical data for different tests. Let's say we are developing the funcy library, which provides functions for working with collections in Python.

Among them, we can find such functions as:

compact
select
flatten

How do we test them? For these functions to work, you need a prepared collection. Let's create the necessary collection, pass it to the function, and look at the result:

def test_compact():
    # Preparing a collection of `coll`
    coll = ['One', True, 3, [1, 'hexlet', [0]], 'cat', {}, '', [], False]

    # Using `coll` for testing
    result = compact(coll)
    assert result == # Here is the expected value

Then we'll do the same for all the other functions:

def test_select():
    # The same collection, there is a duplication
    coll = ['One', True, 3, [1, 'hexlet', [0]], 'cat', {}, '', [], False]

    # Using `coll` for testing
    result = select(coll)
    assert result == # Here is the expected value

Next, we perform this operation for all other tested functions. The collection initialization fragment will start moving from place to place, generating more and more of the same code.

The easiest way to avoid this is to move the collection definition to the module level:

# We describe the creation of the collection in one place for all functions
coll = ['One', True, 3, [1, 'hexlet', [0]], 'cat', {}, '', [], False]

def test_compact():
    result = compact(coll)
    assert result == # Here is the expected value

This simple solution eliminates unnecessary duplication. Keep in mind that this only works within a single module. The collection still needs to be defined in each test module. In our case, this is more of a plus than a minus. However, it's not always possible to transfer data to the module level.

First of all, what about dynamic data? What happens if at least one of the test functions changes this collection?

We use one collection for all tests. If it changes, it means that the tests are dependent on the order of execution. Subsequent tests will receive the modified collection. These situations are unacceptable in testing because they lead to fragile and hard-to-debug tests.

Test frameworks provide hooks — special functions that run before or after tests to solve this problem. In Pytest, we refer to them as fixtures.

Let us look at an example that creates a collection before each test:

import pytest

# Creating a fixture
# Running before each test
@pytest.fixture
def coll(): # We choose the fixture name arbitrarily
    return ['One', True, 3, [1, 'hexlet', [0]], 'cat', {}, '', [], False]

# Pytest returns the result of a function call where we specify the function in the argument
# The parameter name is the same as the fixture name
def test_compact(coll):
    result = compact(coll)
    assert result == # Here is the expected value

# It doesn't matter what the previous test did to the collection
# The collection will be new here because Pytest calls `coll()` again
def test_select(coll):
    result = select(coll, ...)
    assert result == # Here is the expected value

The @pytest.fixture decorator adds an arbitrary function to the test execution process. The fixture runs before the tests requested by function parameters. The argument name must be the same as the fixture name.

Now for another example. Consider code that works with the current time:

from datetime import datetime
# The is one current date for all tests
now = datetime.now()
def test_foo():
    print(now)

def test_bar():
    print(now)

test_foo()
# => 2021-05-19 14:09:12.068421
test_bar()
# => 2021-05-19 14:09:12.068421

The output will show two identical dates. The catch is that the code module is loaded into memory exactly once. It means any code defined at the module level will run once. In the example, we define the now variable before the tests, and only then does Pytest start executing them.

With each subsequent test, the discrepancy between the now variable and the actual value of now becomes larger:

Pytest loads test modules into memory
The now variable is filled with the date when that piece of code runs
Pytest starts executing tests using the date from the previous step

Why might this be a problem? Code that works with the concept of now can expect now to be almost a snapshot of a particular moment. But in the example above, the new variable is starting to lag behind the real one.

The more tests and the more difficult they are, the greater the lag:

import pytest

@pytest.fixture
def current_time():
    return datetime.now()

def test_foo(current_time):
    print(current_time)

def test_bar(current_time):
    print(current_time)

# You can see that the time has changed
# 2021-05-19 14:46:14.109220
# 2021-05-19 14:46:14.110206

Fixtures are ideal for extracting common data needed in different tests. However, using fixtures can lead to more complex code than without them if the data slightly differs.

Recommended materials

Pytest Fixtures

For full access to the course you need a professional subscription.

A professional subscription will give you full access to all Hexlet courses, projects and lifetime access to the theory of lessons learned. You can cancel your subscription at any time.

Get access

130

courses

1000

exercises

2000+

hours of theory

3200

tests