In dynamic languages, code files can be either executable scripts or modules. Each of these kinds of files have their own limitations; they're arranged differently, and they behave differently depending on their role.
On Hexlet's projects, students have to write both. Simultaneously, they make common mistakes that complicate code testing, support, and extensibility. In this article, I will explain what is what and how to organize the code properly. I'll also cover the fundamentals of writing command-line utilities as an added bonus.
Although the examples in this article are given in JavaScript, the language is not so important. All these features are built into other dynamic languages, such as PHP, Python, or Ruby.
Module
Let's start with some definitions. A module is a file containing definitions of functions, classes, and other entities (depending on the language). Modules are known by different names in different languages, but their essence remains the same. An example of a module containing the User
class is shown below:
// This file should be named User.js
// according to JavaScript conventions
export default class User {
constructor(name) {
this.name = name;
}
getName() {
return this.name;
}
}
Modules are not complete programs. They can’t be run directly, for example, via the command line (or it's simply pointless). Modules are intended for use by other modules (or scripts). For this, most languages use either an import system or autoloading, or both. In JavaScript, you must import a module to access its definitions:
// Some module that uses the User class
import User from './User';
// Function definition
const create = (data) => {
const user = new User(data);
return user.save()
};
Beginners frequently make a mistake by just executing function in the same module after they’ve defined it. For example:
// Function definition
const sayHi = () => {
console.log('Boom!');
};
// Function execution!
sayHi();
When exactly this function call will occur? At the moment when this file is uploaded via import or autoloading.
// The call will occur only on the first import
// The other imports cache the results of the call
import User from './User'; // Boom!
In some languages, it is simply impossible to write this way, for example, in Java. The compiler just won't allow it. In other languages, such code is forbidden by coding standards. A linter in PHP, for example, outputs the following warning:
A file SHOULD declare new symbols (classes, functions, constants, etc.) and cause no other side effects, or it SHOULD execute logic with side effects, but SHOULD NOT do both.
Why? Mostly because of the unpredictable behavior. The code defined at the module level (outside of other functions) is called during autoloading or import. Moreover, it is not always possible to pinpoint where this occurs and how frequently. Usually, a framework is in charge of loading the code.
Since the loading takes place outside of the application code, errors at this level cannot be intercepted by the application. Besides, it's just unexpected. The standard import initiates some internal processes, and it becomes impossible to control this situation.
Aside from these features, such code frequently causes side effects, changing the internal state of the program, for example, global variables. This means that after loading such a module, the software abruptly changes its behavior. Ruby and Javascript are particularly prone to this.
To be completely honest, it is sometimes justified.
And last but not least, such calls may block the code testability. For example, if there is a similar call at the module level:
// Calling outside of functions, at the file level
const element = document.querySelector('div');
It will be impossible to use such a file in an environment where there is no document
. There's no way to substitute it, and because the module isn't a function, you can't use dependency inversion.
If the module is written correctly, it is safe to include it in other modules and test it.
Script
What is a script? A script is any file that is supposed to be run from the command line. It can be executable, but it’s not required in most cases:
// date.js
console.log(new Date());
This script contains a function call that prints the current date:
$ node ./date.js
2019-09-17T17:04:10.267Z
The script itself is not run directly but through an interpreter. This approach is frequently used in development.
When a script is publicly “released”, the interpreter is usually hidden and the script is turned into executable. In some languages, the extension is also removed from the executable file name. This is because it becomes irrelevant to users in which language it’s written. This is not necessary for JavaScript since the script name is not related to the file name.
Running such a script looks as follows:
$ ./date
2019-09-17T17:04:10.267Z
You must complete two tasks to make our date file executable:
- Add the execution permission:
chmod +x date
Add a shebang at the beginning of the file:
#!/usr/bin/env node
A line like this inside executable files (scripts) aids the command interpreter (such as bash) in determining which interpreter to use when running the file.
The executable file is only for direct execution. This is a dead-end file that cannot be imported (technically yes, but no) into other modules. It's just not designed for that. Scripts can use modules, but modules can’t access scripts.
In this regard, Python took its own path. Any Python module can be turned into a script by adding a special condition at the end of the file. It only works when the file is run as a script. Meanwhile, the same file can be used as a module without worry of executing code when it's imported
What about testing? Scripts are a headache to test due to their design. You can’t work with them like with regular code. In tests, you will have to run them as an ordinary program and observe what happens, for example, when using STDOUT analysis.
This leads to an important principle: any non-trivial script should only be used to execute the library code.
Command-line utilities and libraries
Packages like eslint or babel can be used in two modes:
- As command-line utilities
- As libraries that can be added as dependencies, imported into code, and called
With this approach, the executable file (script) becomes a client of the library. It's exempt from any tasks that the library itself does. Only in this case, the library can be used somewhere else.
This separation makes testing much easier. Because the logic is stored within the library, you can use it just like regular code.
As a result, the script may contain logic that is unrelated to the library. Such logic, for example, includes parsing command-line arguments: eslint —formatter json src
. This part of the application exists only when the package is used as a utility. Parsing can be placed either in script or in a separate module that is not directly related to the library used.