Working in high-level languages such as JavaScript, it's reasonable not to know how arrays are used for everyday tasks. On the other hand, this kind of understanding makes the code less magical and allows you to look a little further.
Arrays in C
Real arrays are best viewed in C, which is both quite simple and straightforward, but is also very close to the hardware and hides almost nothing from us. When we talk about primitive data types, such as strings or numbers, on an intuitive level everything is quite clear. For each value, a certain amount of memory is allocated (depending on the type), in which the value itself is stored. And how should memory be allocated for storing an array? And what is an array in terms of memory? At storage level, there is no such thing as an array. An array is represented as a solid piece of memory, the size of which is calculated using the following formula: number of elements * amount of memory for each element. There are two interesting conclusions to be drawn from this statement:
- The size of the array is a fixed value. The dynamic arrays (which change their size at runtime) that we deal with in many languages are implemented within the language, not at a hardware level.
- All array elements have the same type and occupy the same amount of memory. This makes it possible to get the address of the cell in which the element we need is located via a simple multiplication (using the formula described above). This is what happens under the hood when you access an array element under a certain index.
In fact, the index in an array is an offset relative to the beginning of the piece of memory containing the array data. The address where the element is located under a particular index is calculated as follows: index * the amount of memory occupied by one element (for a given data type on a given architecture). Example in C:
// Initialize an array of five int type elements
// Assume that an int takes up 2 bytes
// The total amount of memory allocated for the array int * 5 = 2 * 5 = 10 bytes
int numbers[] = {19, 10, 8, 17, 9};
numbers[3]; // 17
Assuming that the int
type occupies 2 bytes in memory, the address of the element corresponding to index 3
is calculated as follows: initial address + 3 * 2. The initial address is the address of the memory cell wherein the array is located. It's formed during the allocation of memory for the array. Below is an example of how to calculate memory addresses for different elements of the numbers
array:
// First element
// Start address + 2 * 0 = start address
numbers[0]; // 19
// Start address + 2 * 1 = start address + 2
// I.e., they're shifted by 2 bytes
numbers[1]; // 10
// Start address + 2 * 2 = start address + 4
// I.e., they're shifted by 4 bytes
numbers[2]; // 8
// Last element
// Start address + 2 * 4 = start address + 8
// I.e., they're shifted by 8 bytes
// And the element itself takes 2 bytes It all adds up to 10
numbers[4]; // 9
Now it should be clear why the indexes in the array start with zero. Zero means there is no offset.
But not all data is the same size. How will an array of strings be stored? Lines have different lengths, which means they require different amounts of memory to store them. One way to store strings in an array in C is to create an array of arrays (here it's important to understand that any string in C is an array of characters). Nested arrays must be the same size, it's impossible to get around the physical limitations of arrays. The trick is that this size should be large enough to fit the necessary strings.
// An array containing three elements, with arrays containing 10 elements each inside
// This means that here you can store 3 strings of no more than 10 characters long
char strings[3][10] = {
"spike",
"tom",
"jerry"
};
strings[0]; // spike
Safety
Unlike high-level languages in which code is protected against array overruns, in a language like C, overruns do not lead to errors (in fact, they may lead to segfaults, but that doesn't matter here). Accessing an element whose index is outside the array will return data that lies in the very memory area it was asked to access (according to the formula above). No one knows what they will turn out to be (but they will be interpreted according to the type of array. If the array is an int type array, a number will be returned). Array overruns are actively exploited by hackers to break into programs.
Arrays in dynamic languages
In dynamic languages such as JavaScript, the structure of arrays is much more complex than in C, since the data types are calculated automatically at the time of code execution. An array in this environment cannot work like it would in C. We don't know what data types will end up inside in the process.
Arrays in these languages do not contain the data itself, but references (addresses in memory) to them. This makes what needs to be stored irrelevant. Any value in an array is an address that has the same size regardless of the data it points to. This approach makes arrays flexible, but on the other hand, it's slower.
Besides, arrays in dynamic languages are also dynamic. That is, their size can increase or decrease as the program runs. Technically, it works like this. If the references (remember, the data is not stored there) do not fit in the array, the interpreter internally creates a new, bigger array (usually twice the size) and transfers all references there. Dynamic arrays greatly simplify the development process, with speed as a trade-off.
For full access to the course you need a professional subscription.
A professional subscription will give you full access to all Hexlet courses, projects and lifetime access to the theory of lessons learned. You can cancel your subscription at any time.