Register to get access to free programming courses with interactive exercises

Hash tables Python: Dictionaries and Sets

An associative array is an abstract data type used to store key-value pairs. Different languages have different data types for it:

  • In Python — Dictionary
  • In Ruby — Hash
  • In Lua — Table
  • In JavaScript — Object
  • In Elixir and Java — Map

Associative arrays are popular in applied programming. They are convenient for representing compound data that contain many different parameters.

In a usual indexed array, we arrange values by indexes, which means that it can be put into memory as is. Things work differently with associative arrays. They do not have indexes to determine the order — so there's no easy way to get to the values. Associative arrays often use a hash table — a specific data structure to implement them.

A hash table allows you to organize the data of an associative array for convenient storage. A hash table uses an indexed array and a function to hash the keys. That said, a hash table isn't just a way to place data in memory; it involves logic.

In this lesson, we will discuss hash tables and learn how associative arrays work internally. You'll understand how many different processes go on in a program when we run this kind of simple code:

data = {}
data['key'] = 'value'

How do hash functions work

Any operation inside a hash table starts with converting the key somehow into an index of a regular array. To obtain an index from a key, you need to perform two actions:

  • Find the hash, that is, hash the key
  • Convert a key to an index — for instance, through the remainder of a division

Hashing is an operation that converts any input data into a string or a number of a fixed length. The function implementing the conversion algorithm is called a hash function. The result of hashing is called a hash or hash sum.

The most famous are CRC32, MD5, SHA, and many other types of hashing:

# Python has the zlib library, which contains the crc32 hashing algorithm
# This algorithm is convenient for clarity
import zlib

# Any data we want to hash is represented as a byte string
data = b'Hello, world!'
hash = zlib.crc32(data)

# hash is always the same for the same data
print(hash) # => 3957769958

We often encounter hashing in development. For example, the commit id in git 0481e0692e2501192d67d7da506c6e70ba41e913 is a hash resulting from hashing commit data.

When hashing, you first need to get a hash. It can then be converted into an array index — for example, to calculate the remainder of a division:

Sign up

Programming courses for beginners and experienced developers. Start training for free

  • 130 courses, 2000+ hours of theory
  • 1000 practical tasks in a browser
  • 360 000 students
By sending this form, you agree to our Personal Policy and Service Conditions

Our graduates work in companies:

<span class="translation_missing" title="translation missing:">Bookmate</span>
<span class="translation_missing" title="translation missing:">Healthsamurai</span>
<span class="translation_missing" title="translation missing:">Dualboot</span>
<span class="translation_missing" title="translation missing:">Abbyy</span>
Suggested learning programs
Developing web applications with Django
10 months
from scratch
under development
Start at any time

Use Hexlet to the fullest extent!

  • Ask questions about the lesson
  • Test your knowledge in quizzes
  • Practice in your browser
  • Track your progress

Sign up or sign in

By sending this form, you agree to our Personal Policy and Service Conditions
Toto Image

Ask questions if you want to discuss a theory or an exercise. Hexlet Support Team and experienced community members can help find answers and solve a problem.