An associative array is an abstract data type used to store key-value pairs. It often has other names, such as "dictionary" or "map" It corresponds to different data types in different languages. In JavaScript it's an object, in other languages, it's...
- Ruby — Hash
- Lua — Table
- Python — Dictionary
- Elixir/Java — Map
What is it for? Associative arrays are extremely popular in applied programming. Their use makes it convenient to represent compound data containing many different parameters. In fact, all the previous lessons on objects in JavaScript were about how to use objects as associative arrays.
An associative array, unlike a regular one (usually called an indexed array because its values are ordered by indexes), cannot be put into memory "as is". It has no indexes to determine the order, nor does it have an easy way to get to the values. A special data structure called a hash table is often used to implement associative arrays. It allows you to organize the data in an associative array in a convenient way to store it. The hash table applies two things to do this, an indexed array, and a function that hashes the keys. Note that a hash table is not just a way to place data in memory, it involves logic.
Below, we'll discuss more about how associative arrays are arranged internally. This information is essential for developers who want to really understand what they are doing. It removes the "magic" from what is going on inside the language and gives an understanding of the price one has to pay for objects to be useful.
So, this is what roughly happens when we execute the code:
const data = {};
data['key'] = 'value';
Hashing
Any operation inside a hash table starts with somehow converting the key into an index of a regular array. To get an index from a key, you need to perform two actions; find a hash (hash a key) and reduce it to an index (for example, using the remainder of a division).
Hashing is an operation that converts any input data into a string (rarely a number) of fixed length. The function that implements the conversion algorithm is called a "hash function", and the result is called a "hash" or "hash sum". The best known are CRC32, MD5, and SHA (that has many varieties).
// JavaScript has no built-in support for the crc32 hashing algorithm (which is convenient for clarity)
// Therefore, a third-party library is used
import crc32 from 'crc-32';
const data = 'Hello, world!'; // Any data we want to hash
const hash = crc32.str(data);
// The hash will always be the same for the same data!
console.log(hash); // => -337197338
We often encounter hashing in development. For example, if we look at a commit id in git, e.g.,0481e0692e2501192d67d7da506c6e70ba41e913 , it's actually nothing more than a hash resulting from hashing commit data.
Once the hash is obtained, it can be converted into an array index, for example, by obtaining the remainder of division:
// This is done so that the indexes don't get too large
// The bigger the array, the more memory it occupies
const index = Math.abs(hash) % 1000; // modularly
console.log(index); // => 338
Behind the scenes
Consider the process of adding a new value to an associative array (remember, in JavaScript it is represented by the Object data type). A programmer writes:
const data = {};
data['key'] = 'value';
Such a simple line, at first glance, triggers a whole process. Below is an abridged description of it:
// shown in JavaScript for simplicity, although in reality all of this happens at a lower level
// 1. Creating an associative array results in the initialization of an indexed array within the interpreter
const internal = [];
// While assigning the value `data['key'] = 'value'`, the interpreter performs several actions:
// 2. Hashes the key. The result of hashing is a number
const hash = crc32.str('key');
// 3. The number obtained in the previous step is converted to an array index
const index = Math.abs(hash) % 1000;
// Another array is written to the value of the internal indexed array, using the index found,
// the first element of the index is the `'key'`, and the second value `'value'`
internal[index] = ['key', 'value'];
Why such a strange structure for storage? Why is there a key in there? The answer to this question will be below, when we talk about collisions.
Now let's read the value:
const data = {};
data['key'] = 'value';
console.log(data['key']); // => "value"
// shown in JavaScript for simplicity, although in reality all of this happens at a lower level
// 1. The key is hashed. The result of hashing is a number
const hash = crc32.str('key');
// 2. The number obtained in the previous step is converted into an array index
const index = Math.abs(hash % 1000);
// 3. If the index exists, it retrieves the array that was inside and returns it to the outside
return internal[index]; // ['key', 'value']
Collisions
Any string (of any length and content) can be a key in an associative array. In other words, the set of all possible keys is infinite. In turn, the result of a hash function is a fixed-length string, which means that the set of all output values is finite.
It follows that not all input data will have a unique hash. At some point, there may be duplicates (when you get the same hash for different values). Such a situation is commonly known as a collision. There are several ways to resolve collisions, and each corresponds to a different type of hash table.
// Example of a collision
// Two different strings return the same hash!
// As if there were two elements under one index in an array at once
crc32.str('aaaaa0.462031558722291'); // 1938556049
crc32.str('aaaaa0.0585754039730588'); // 1938556049
Collisions are not as rare as they may seem. You can see this by examining the birthday problem.
Are there any more questions? Ask them in the Discussion section.
The Hexlet support team or other students will answer you.
For full access to the course you need a professional subscription.
A professional subscription will give you full access to all Hexlet courses, projects and lifetime access to the theory of lessons learned. You can cancel your subscription at any time.