Python: Building data abstractions
Theory: Invariants
Abstraction allows us not to think about the details of the implementation and to focus on its use. Moreover, the implementation of an abstraction can be rewritten, if necessary, without fear of breaking the code that uses it. But there is another reason why you need to use abstraction-maintaining invariants.
In programming, an invariant is a logical expression that defines the consistency of a state or data set.
Let's look at an example. When we described the constructor and selectors for rational numbers, we implicitly implied the following invariants:
By passing the numerator and denominator of the rational number constructor, we expect to get the same numbers when we apply the selectors to the rational ones. It is how we ensure that the abstraction works — we test the code in practice.
Invariants exist for every operation. And they can be tricky. For example, we can compare rational numbers to each other, but not directly, because we can represent the same fractions in different ways: 1/2 and 2/4.
Code that doesn't take this into account won't work:
Reducing a fraction to a normalized form is called normalization. We can do it in several ways. The most obvious is to perform normalization when creating the fraction inside the make_rational function.
Another is to perform normalization when accessing the fraction through the get_numer and `get_denom' functions. The latter method has a disadvantage — it performs normalization on each call. You can avoid this by using the memoization technique.
Considering the new introductions, it becomes clear that the invariant linking of the constructor and selectors needs to be modified. The functions get_numer and get_denom should not return the passed values, but the values after normalization, if the fraction is already normalized:
The abstraction hides the implementation from us and becomes responsible for preserving invariants. Any work that bypasses the abstraction is fraught because it does consider internal transformations:
In other words, working directly with data and bypassing the abstraction can easily break the invariants provided by the extra logic in the constructor or selectors. That is why we should use the code as the authors intended.
Looking at the examples above, you may have a reasonable question. Is it possible to make it impossible to bypass the abstraction? Globally, yes. It is data hiding. Usually, a special syntax is used in languages to provide hiding. However, we can protect data with special syntax, but only at the expense of higher-level functions. The method creates abstractions using anonymous functions, closures, and message passing. Try our Python: Composite Data course to learn more about this.
We want to warn you not to join this cargo cult. The data protection idea seems reasonable, but we can manage these mechanisms easily with the Reflection API, and even without it, simply at the expense of reference data. It renders the protection somewhat useless.
The second point is related to the fact that there are many languages in the world, such as JavaScript, which works fine with abstractions but has no mechanisms for data protection, and nothing terrible has ever happened. In other words, when you use abstractions, nobody deliberately tries to break them. And we tend to think that the importance of enforced privacy is greatly exaggerated.