Initializing new values and defaultdicts — Python: Dictionaries and Sets

The defaultdict package
The differences between defaultdict and setdefault

Imagine the following situation: you need to store something changeable in a dictionary as values, such as lists. And while working with this dictionary, you find yourself in a situation where you have a key and an element to add to the list, but the issue is that the key may not be in the dictionary. Here is the code you have to write:

if key not in dictionary:
    dictionary[key] = []  # Initializing the list
dictionary[key].append(value)  # Changing the list

It is not a particularly rare situation. The writers of the Python standard library also realized this and added the setdefault method. We can rewrite the above code using this method:

dictionary.setdefault(key, []).append(value)

It is compact and concise. But what does the setdefault method do? It takes a key and a default value and returns a reference to the value in the dictionary associated with the specified key. And if that key is not in the dictionary, then the method gives that key the default value and returns a reference to it. In the example above, the default value is an empty list [].

The `defaultdict` package

The standard Python package includes the collections module. Among other things, this module provides the defaultdict type. The defaultdict is an ordinary dictionary with one unique property — while a dictionary would tell you off for a missing key, the defaultdict returns the default value. Let us look at an example:

from collections import defaultdict
d = defaultdict(int)
d['a'] += 5
d['b'] = d['c'] + 10
d  # defaultdict(<class 'int'>, {'a': 5, 'c': 0, 'b': 10})

When we created the dictionary, we specified the int function as an argument. If we call this function without arguments, it will return 0. And this call inside the d dictionary occurs whenever you need to get a value for a non-existent key.

Therefore, d['a'] += 5 will result in 5 because:

First, we create an initial value for the 'a' key making an int() call and getting 0
Second, we add 5 to it

In the line d['b'] = d['c'] + 10, we:

Create values for the 'b' and 'c' keys
And then write the sum of 0 + 10 to the 'b' key

Here is another example, this time with an initializer function we made ourselves:

def new_value():
    return 'foo'
x = defaultdict(new_value)
x[1]  # 'foo'
x['bar']  # 'foo'
x  # defaultdict(<function new_value at 0x7f2232cf5a60>, {1: 'foo', 'bar': 'foo'})

Disregarding the somewhat incomprehensible mention of the initializer function, we can see that all the keys we have accessed in the dictionary contents now have strings containing 'foo' written to them.

The differences between `defaultdict` and `setdefault`

Why have both methods if they are so similar, I hear you ask. Let us compare these two strings:

a.setdefault(key, []).append…
# vs
b[key].append…

# b is the defaultdict(list)

The strings are very similar. Python makes an empty list object in the first line each time it creates a new list only if it does not find the key. Since the program calculates the values of the arguments before calling the setdefault(key, []) function, we can ignore the cost of creating an empty list in this case.

Creations require database lookups, so the defaultdict option is much more preferable when the cost of creating a default value is high.

Why use setdefault at all? Well, you can use it to initialize different values with different keys. Since we pass the default value each time, we can even store different types of data using multiple keys. With defaultdict, we do not have any control over which values to put on which keys. We call the initializer function each time, and Python does not pass the key to it.

Finally, there are always rare cases where defaultdict is unsuitable because you need to initialize the values differently, but setdefault is not good either. The new values are immutable, so we cannot change them by the returned link. Here is an example of such a case with the solution to the problem of not finding the key:

x['count'] = x.get('count', 0) + 1
x['path'] = x.get('path', '') + '/' + dir

Yes, you have to use the one key many times, but the code itself reads well, and we can say this is an optimal situation.

For full access to the course you need a professional subscription.

A professional subscription will give you full access to all Hexlet courses, projects and lifetime access to the theory of lessons learned. You can cancel your subscription at any time.

Get access

130

courses

1000

exercises

2000+

hours of theory

3200

tests