Creating functions is easy, creating functions properly is harder. Poorly designed functions frequently have to be rewritten, it's difficult to adapt them to new requirements, and they don't get tested properly. In this article, we'll look at key methods for sharing responsibilities, building chains of functions, and designing function signatures. The article's content is based on common mistakes made by Hexlet students in their projects.
- Clean code and side effects
- Modular dependencies
- Composition instead of nesting (Pipeline)
- Additional materials:
Before we get into the detail, let's first discuss the general concept of functions. In modern imperative languages, a function is a separate block of code that can return a result. Hence, developers frequently believe that the primary purpose of functions is to avoid duplication. This point of view is shared by advanced code editors, which recommend moving repetitive code into a separate function when they notice it.
However, this is not the case. First and foremost, a function is a means of increasing abstraction. In other words, a function hides some operations making them the single whole without specifying their details. As a result, it becomes possible to eliminate code duplication. This way of thinking can have an impact on how functions are designed. Look at the following example:
const result = makeFriendship(user1, user2);
if (result) {
// …
}
The makeFriendship
function exists not because its code is repeated throughout the program, but because it hides the specifics of the operation being performed. This type of code is much more “human”.
However, even this understanding is insufficient for designing good functions. Next, we'll look at specific approaches to help improve functions. The same task is used in all of the examples. This will save your time when examining the context.
The task is to create a library function that compares two Hexlet courses and returns the name of a shorter (by duration) course.. To complete this task, the library must request HTML pages, search for course duration there, compare it, and return the course name.
It is implemented as follows:
import compareCourses from 'course-comparator';
const courseLink1 = 'https://hexlet.io/courses/js-arrays';
const courseLink2 = 'https://hexlet.io/courses/intro_to_git';
// The function is asynchronous, since it executes http requests
const courseName = await compareCourses(courseLink1, courseLink2);
console.log(courseName); // Git fundamentals
If you implement this function straightforwardly, without building any abstractions, it will look like this:
const axios = require('axios'); // http client
const cheerio = require('cheerio'); // jquery's analog in node.js
const compareCourses = async (link1, link2) => {
// request course pages
const response1 = await axios.get(link1);
const response2 = await axios.get(link2);
// Obtaining the course time and name from the HTML page of the first course
const dom1 = cheerio.load(response1.data); // uploading HTML to cheerio
const h1Element1 = dom1('h1'); // obtaining the header
const courseName1 = h1Element1.text();
const divElement1 = dom1('div.h3.mt-1').first(); // selecting a container with course time
const time1 = divElement1.text().split(' ')[0]; // parsing a string and extracting a time
// Obtaining the course time and name from the HTML page of the second course
const dom2 = cheerio.load(response2.data);
const h1Element2 = dom2('h1');
const courseName2 = h1Element2.text();
const divElement2 = dom2('div.h3.mt-1').first();
const time2 = divElement2.text().split(' ')[0];
// The function's main logic
return time1 > time2 ? courseName1 : courseName2;
};
You can check how it works on repl.it.
This function is so small that in real life it would remain so. The code is straightforward, clear and short. Almost any modification to this function will result in bloating a code. It's fine for a demonstration, though; a too complex function could make the article overwhelming for most readers.
Following that, we will go over various methods of splitting the code within this function and analyze bad solutions that are frequently encountered in practice. Finally, I'll show you how to do it better. At the same time, keep in mind that any changes made within this function won’t affect those who use the library. Its user interface will not change.
The first thing that strikes you is the repetition of logic on each page:
- Downloading the page
- Obtaining the course name and duration
You can eliminate duplicate code by doing the following:
const getPageInfo = async (link) => {
const response = await axios.get(link);
const dom = cheerio.load(response.data);
const h1Element = dom('h1');
const courseName = h1Element.text();
const divElement = dom('div.h3.mt-1').first();
const time = divElement.text().split(' ')[0];
return { time, courseName };
};
const compareCourses = async (link1, link2) => {
const data1 = await getPageInfo(link1);
const data2 = await getPageInfo(link2);
return data1.time > data2.time ? data1.courseName : data2.courseName;
};
This code looks quite good. It’s read well and has no duplication. However, it is poorly designed.
Clean code and side effects
The main issue is that the code that is interacting with the external environment (performing a side effect), axios.get(link)
, was moved from the external to a deeper level in the internal function, making it asynchronous. Why is it so bad? Side effects complicate any code in which they appear:
- Any function that has side effects is no longer pure. The function's behavior becomes less predictable, and it leads to a variety of errors. In our case, these are network-related errors (dozens of them!)
- Side effects make it much more difficult to reuse and test functions. A stable network is required for the
getPageInfo(link)
function to work. But that's not all; it also depends on the Hexlet itself working without interruptions and responding quickly. Of course, this is almost always the case :), but things happen - Any code that calls a function with side effects becomes "dirty" and suffers from the same drawbacks. The deeper the side effect is placed on the call stack, the worse it gets
This applies to all side effects, not just network requests. First and foremost, any file operations (input/output, IO) such as reading and writing, communicating with databases, sending emails, and so on. Second, changing the environment within the program (modifying global variables or any general state).
"Isolate side effects from pure code," says the main architectural principle. Therefore, everything related to input/output should be at the top level rather than inside. Typically, when the program starts working, first the required data is read, then a large block of basic logic (pure code) is executed, and a side effect, such as writing to a file, is called again at the output. This is not always possible, but it's something we should aim for.
Let's rewrite the code according to this:
// The function is no longer asynchronous!
const getPageInfo = (response) => {
const dom = cheerio.load(response.data);
const h1Element = dom('h1');
const courseName = h1Element.text();
const divElement = dom('div.h3.mt-1').first();
const time = divElement.text().split(' ')[0];
return { time, courseName };
};
const compareCourses = async (link1, link2) => {
const response1 = await axios.get(link1);
const response2 = await axios.get(link2);
const data1 = getPageInfo(response1);
const data2 = getPageInfo(response2);
return data1.time > data2.time ? data1.courseName : data2.courseName;
};
Taking out the query from the getPageInfo(response)
function made it pure. It is no longer asynchronous and does not generate network errors. It's easy to test, as the input arguments are the only thing to determine its behaviour (since there is no external environment or network). However, it is not as good as it could be.
Modular dependencies
After the changes, the getPageInfo responsibility was reduced to parsing the input HTML and returning the required data. Is it possible that its behavior is influenced by how it gets an HTML? The correct answer is no. It shouldn’t matter how exactly it came into the program. But in reality, it is not: right now the function takes an object response
, which is directly related to the network and is inherent in the axios library.
This is worth emphasizing because developers make similar mistakes at each stage. You've probably noticed that we designed the compareCourses
function according to a top-down approach. We started by defining its external interface before delving deeper. This is the right approach because it focuses on our library's users rather than its internals.
The issues arise when the developer adjusts the behavior and interface of the lower modules to the upper modules. Let me give you an example. In first Hexlet project we ask to create a text game in which the player is given a number and must determine whether it is even or odd. This is done by typing yes or no into the console. This task has a clear level separation: we have a number and a method for its parity check, and there is a separate task — checking the user's input to see if they guessed correctly. However, many developers do the following:
const isEven = number => (number % 2 === 0 ? 'yes' : 'no');
// userAnswer is either 'yes' or 'no'
if (isEven(number) === userAnswer) {
console.log('You won!');
}
Due to the fact that the values yes or no are entered into the program, the developer "adjusted" the process of working with an even number to it. Such a code is said to have "leaky abstraction." This is a major violation of modularity. Upper-level modules should not be aware of the existence of lower-level modules. This upper level must adapt to the lower level.
The correct implementation should look like this:
const isEven = number => number % 2 === 0;
const expectedAnswer = isEven(number) ? 'yes' : 'no';
if (expectedAnswer === userAnswer) {
console.log('You won!');
}
The isEven(number)
function now fulfills its purpose as it should. It can be used in any context and even moved to a separate library for further reuse.
The OSI model is another excellent example that any developer should be familiar with. Because the network is built on this concept, developers use HTTP without considering how data is delivered to the user (via air or wire).
Let's go back to our code and change getPageInfo function:
// The input receives html as a string
const getPageInfo = (html) => {
const dom = cheerio.load(html);
const h1Element = dom('h1');
const courseName = h1Element.text();
const divElement = dom('div.h3.mt-1').first();
const time = divElement.text().split(' ')[0];
return { time, courseName };
};
const compareCourses = async (link1, link2) => {
const response1 = await axios.get(link1);
const response2 = await axios.get(link2);
const data1 = getPageInfo(response1.data);
const data2 = getPageInfo(response2.data);
return data1.time > data2.time ? data1.courseName : data2.courseName;
};
The main idea behind our latest refactoring is called level design. Successful level partitioning makes the code more stable, easier to analyze, and less connected. For example, if we decide to use a different library for requests or abandon HTTP entirely in favor of working with files, the code responsible for parsing will be unaffected. However, it does not apply to solutions that mix parsing and data gathering.
Level design is a complex thing. It requires a deep understanding of the processes that occur in the code, as well as careful handling of side effects and good state management. The SICP is the only book I'm aware of that will boost your knowledge of level design. For example, in this book, a graphic library is built in which level design is extensively employed.
Composition instead of nesting (Pipeline)
The code written in accordance with the preceding principles looks like a flat chain of functions through which data is passed. Each function returns new data that is further used in other functions or directly.
const compareCourses = async (link1, link2) => {
// A link as an input, a response as an output
const response1 = await axios.get(link1);
// Now response is an input, data is an output
const data1 = getPageInfo(response1.data);
const response2 = await axios.get(link2);
const data2 = getPageInfo(response2.data);
return data1.time > data2.time ? data1.courseName : data2.courseName;
};
In programming, this code execution style is called a Pipeline. It is only possible in well-structured code, and where functions are not directly dependent on one another but instead focus solely on their tasks. This enables them to be linked and reused. Pipelines are common not only in programming but also in command line usage:
# symbol | — this is a pipe, and the chain itself is a pipeline
$ ls -l | grep package | sort
Pipeline is such a powerful thing in programming that in many languages it exists at the syntax level. For example: F#, OCaml, Elixir, Elm, Julia, Hack. And right now work is underway to implement pipeline in JavaScript. Look at how the code changes with it:
// function chain before pipeline implementation (although you can create intermediate constants)
const result = exclaim(capitalize(doubleSay('hello'))); // 'Hello, hello!'
// pipeline at the syntax level
// The input line is passed through the chain of functions in the specified order
// The output of each function becomes the input in the next step
const result = 'hello'
|> doubleSay
|> capitalize
|> exclaim;
Here's how our function's code will look with the new syntax:
// This function had a pipeline before the new syntax was introduced
// The new syntax is particularly useful in long chains
const compareCourses = async (link1, link2) => {
const data1 = await axios.get(link1)
|> getPageInfo(#.data); // # - the data from a previous stage, here it’s the response
const data2 = await axios.get(link2)
|> getPageInfo(#.data);
return data1.time > data2.time ? data1.courseName : data2.courseName;
};