This article comes from NetEase cloud community.
This article translated from:ES modules:a cartoon deep-dive
The ES module provides the official standardized module system for JavaScript. However, this has gone through some time--nearly 10 years of standardization work.
But the waiting is nearing the end. With the May release of Firefox 60 ( currently beta ), all major browsers support ES modules, and the Node module team is working to add support for ES modules in node. js . The integration of ES modules for WebAssembly is also in progress.
Many JavaScript developers know that the ES module is always controversial. But few people really understand how the ES module works.
Let's look at what the ES modules can solve and how they differ from the modules in other module systems.
What problem does the module solve?
It can be said that JavaScript programming is the management of variables. The thing to do is to assign a value to a variable, or to add it on a variable, or to group two variables together and put them in another variable.
Since much of your code is about changing variables, how you organize these variables can have a big impact on how you encode and the maintainability of your code.
Just a few variables at a time can make things easier. JavaScript has a way to help you do this, called a scope. Because of the scope rules in JavaScript, a function cannot access variables defined in other functions.
This is good. This means that when you write a function, you simply focus on the function itself. You don't have to worry about what other functions might do to variables within a function.
Nonetheless, it still has flaws. This makes it somewhat difficult to share variables between functions.
What if you want to share variables outside the scope? A common way to deal with this problem is to put it in a more outer scope ... For example, in the global scope.
You may remember this situation in the jQuery era. Before loading any jquery plugins, you must make sure that jquery is in the global scope.
This is effective while also producing side effects.
First, all script tags need to be arranged in the correct order. So you have to be careful to make sure that order is not disrupted.
If you mess up this order, your application throws an error during the run. When the function looks for the jQuery it expects-in the global scope-but does not find it, it throws an error and stops running.
This makes maintaining the code very tricky. This makes removing old code or old script tags into a roulette game. You have no idea what you're going to spoil. dependencies between different parts of the code are implicit. Any function can get anything in the global scope, so you don't know which function depends on which script tag.
The second problem is that because these variables are in the global scope, each part of the globally scoped code can change that variable. Malicious code might intentionally change the variable to make your code do something you do not want, or the non-malicious code might accidentally mess up your variable.
How does the module provide help?
The module provides you with a better way to organize these variables and functions. With modules, you can group meaningful variables and functions together.
This puts these functions and variables into the module scope. Module scopes can be used to share variables between functions in a module.
However, unlike function scopes, module scopes can also provide their variables to other modules. They can clearly state which variables, classes, or functions in the module should be shared.
When something is provided to other modules, it is called export. Once you have declared an export, other modules can explicitly say that they depend on the variable, class, or function.
Because this is an explicit relationship, when you delete a module, you can determine which modules will have problems.
Once you have the ability to export and import variables between modules, it is easier to break the code down into small pieces that can work independently. You can then assemble or reorganize these blocks of code (like Lego) and create a variety of different applications from the same set of modules.
Since the module is very useful, there have been several attempts to add module functionality to JavaScript in history. Today, two modular systems are being used on a wide scale. CommonJS (CJS) is used in the history of node. js. The ESM (EcmaScript module) is an updated system that has been added to the JavaScript specification. The browser already supports the ES module, and Node is adding support.
Let's take a closer look at how this new modular system works.
How does the ES module work?
When using module development, a dependency graph is created. The connections between the different dependencies come from the various import statements that you use.
The browser or Node uses the import statement to determine what code needs to be loaded. You give it a file to use as an entry for the dependency graph. It will then find all the remaining code along with the import statement.
However, the browser does not directly use the file itself. It needs to parse these files into a data structure called Module Records. So it knows exactly what's going on in the file.
After that, the module record needs to be converted into a module instance (modules instance). An instance consists of two parts: Code and status.
The code is basically a set of instructions. It's like a recipe for telling you how to make something. But you just have to rely on code and you can't do anything. You need to combine the raw materials with these instructions.
What is a state? State is the thing that gives you these raw materials. An instruction is a collection of the actual values of all variables at any time. Of course, these variables are just the names of the data blocks that hold the values in memory.
So the module instance combines the code (the instruction list) and the state (the values of all variables) together.
What we need is a module instance for each module. Module loading is the process of generating a dependency graph that contains all the module instances, starting with this portal file.
For ES modules, there are three main steps:
- Construct--Find, download, and parse all files into the module record.
- Instantiation--looking for an area in memory to store all the exported variables (but not yet populated values). Then let both export and import point to these memory blocks. This process is called a link (linking).
- Evaluation-run the code and fill in the memory block with the actual value of the variable.
People say ES modules are asynchronous. You can think of it as asynchronous, because the entire process is divided into three phases--loading, instantiation, and evaluation--which can be done separately in three phases.
This means that the ES specification does introduce a type of asynchrony that does not exist in CommonJS. I'll explain later, but in CJS, a module and all dependencies under it can be loaded, instantiated, and evaluated at once without any interruption in the middle.
Of course, these steps do not have to be asynchronous by themselves. They can be done in a synchronous manner. It depends on who is doing the loading process. This is because the ES module specification does not control everything. There are actually two parts of the work, which are controlled by different specifications.
The ES module specification explains how to parse a file into a module record, and how to instantiate and evaluate the module. However, it does not describe how to get the file.
is the loader to get the file. The loader is defined in another different specification. For browsers, this specification is an HTML specification . But you can have different loaders depending on the platform you are using.
The loader also controls exactly how the module is loaded. It calls the ES module's method-- ParseModule
, Module.Instantiate
and Module.Evaluate
. It's kind of like a puppet that controls the JS engine by a line.
Now let's cover each step in more detail.
Structure
In the construction phase, each module will experience three things.
- Find out where to download the file that contains the module (also known as module parsing)
- Get Files (download from URL or load from file system)
- To parse a file into a module record
Find files and get
The loader will be responsible for locating the file and downloading it. First it needs to find the entry file. In HTML, you tell the loader where to find it by using the script tag.
But how does it find the remaining stacks of modules--those that are main.js
directly dependent?
This will use the import statement. Part of the import statement is called the module identifier. It tells the loader where to find the remaining modules.
There is a point to note about module identifiers: They sometimes require different processing between the browser and Node. Each host has its own way of interpreting the module identifier string. To do this, it uses an algorithm called Module parsing, which differs between platforms. Currently, some of the module identifiers available in Node do not work in the browser, but the problem is being fixed .
Before repairing, the browser only accepts URLs as module identifiers. They will load module files from this URL. However, this is not happening on the entire dependency graph at the same time. Before parsing a file, it is not known which dependencies the module in this file needs to acquire ... and cannot parse the file until it gets the file.
This means that we have to iterate through the dependency tree, parse a file, find its dependencies, and then find and load those dependencies.
If the main thread waits for these files to be downloaded, many other tasks will accumulate in the queue.
This is why the download section takes a long time when you use the browser.
Based on this chart .
Blocking the main thread like this will make the application with the module too slow to use. This is one of the reasons that the ES module specification divides the algorithm into multiple stages. Separate the construction process so that the browser can download the file itself and establish its own understanding of the module diagram before performing the synchronous initialization process.
This approach, which breaks down the algorithm into different stages, is one of the main differences between the ES module and the CommonJS module.
The reason that CommonJS can be handled differently is that it takes much less time to load files from the file system than to download them on the Internet. This means that Node can block the main thread when the file is loaded. And since the file has already been loaded, the direct instantiation and evaluation (not distinguishing between the two phases in CommonJS) is taken for granted. This also means that before returning to the module instance, you traverse the entire tree, loading, instantiating, and evaluating all dependencies.
The CommonJS method has some implicit features that I'll explain later. One is that in Node using the CommonJS module, you can use variables in the module identifiers. Before you find the next module, you execute all the code in this module (up to the require
statement). This means that when you do the module parsing, the variables will have values.
But for ES modules, you need to build the entire module diagram beforehand before doing any evaluation. This means that you cannot have variables in your module identifiers because they do not have values yet.
But sometimes it is really useful to use variables in the module path. For example, you might need to switch the loading of a module depending on the operation of the code or the running environment.
In order for the ES module to support this, there is a proposal called dynamic import . With it, you can use the import(`${path}`/foo.js
import statement like this.
The principle is that any file that is import()
loaded will be used as an entry point for a separate dependency graph. The dynamically imported module opens a new dependency graph and processes it separately.
It is important to note that both modules that exist in both dependency graphs share the same module instance. This is because the loader caches the module instance. There will be only one module instance for each module in a particular global scope.
This means that the workload of the engine is reduced. For example, this means that even if multiple modules depend on a module, the module's files are only retrieved once. (This is one reason for the caching module, and we'll see another in the Evaluation section.) )
The loader uses something called a module mapping to manage this cache. Each global scope tracks its modules in a separate module map.
When the loader begins to fetch a URL, it puts the URL into the module map and marks it as being getting the file. It then makes a request and continues to start getting the next file.
What happens if another module relies on the same file? The loader looks for each URL in the module map. If you see fetching
it, it will start the next URL directly.
But module mapping is not just about tracking which files are being fetched. Module mappings can also be cached as modules, as we'll see in the next step.
Analytical
Now that we have obtained this file, we need to parse it into a module record. This helps the browser understand the different parts of the module.
Once the module record is created, it is recorded in the module map. This means that if there is a request for it at any time after that, the loader can get it from the map.
One detail in the parsing may seem trivial, but it actually has a big impact. All modules are considered to be used at the top of the "use strict"
parse. There are other subtle differences. For example, await
the keyword remains in the top-level code of the module, and this
the value is undefined
.
This different parsing method is called the "resolution Target". If you use different targets to parse the same file, you will get different results. So at the beginning of parsing you want to know the type of file being parsed--whether it's a module.
This is easy in the browser. You only need to set it in the script tag type="module"
. This tells the browser that this file should be parsed into a module. In addition, because only the module can be imported, the browser will know that any import is a module.
But in Node, you don't use HTML tags, so you can't choose to use type
attributes. One way the community is trying to solve this problem is to use the .mjs
extension. Use this extension to tell node"that this file is a module". You'll see that people call this a signal to parse the target. The discussion is still in progress, so it is unclear what signal the Node community will eventually decide to use.
Either way, the loader decides whether to parse the file into a module. If it is a module and has an import, the loader will start the process again until all the files have been fetched and parsed.
We are done! At the end of the loading process, only one entry file becomes a bunch of module records.
The next step is to instantiate the module and link all of the instances together.
Instantiation of
As I mentioned earlier, the instance combines the code with the state. The state exists in memory, so the instantiation step is to connect the content to memory.
First, the JS engine creates a module environment record (modules environment record). It manages the module to record the corresponding variables. It then allocates memory space for all of the export. The module environment record tracks the association between different memory regions and different export.
These memory areas have not yet been assigned a value. They will get a true value only after they have been evaluated. This rule has a point to note: Any export function declaration is initialized at this stage. This makes it easier to find the value.
In order to instantiate the module diagram, the engine performs the so-called depth-first post traversal. This means that it will go deep into the bottom of the module diagram--until it is not dependent on the bottom of anything else--and handle their export.
The engine connects all the exports under a module-that is, all the exports that the module relies on. It then goes back to the previous layer to connect all the imports of the module.
Note that both the export and import point to the same area in memory. The first connection export ensures that all exports can be connected to the corresponding import.
This is different from the CommonJS module. In CommonJS, the entire export object is copied at export. This means that any value of export, such as a number, is a copy.
This means that if the export module changes the value later, the import module will not see the change.
In contrast, the ES module uses something called Dynamic binding (live bindings). All two modules point to the same location in memory. This means that when the export module changes a value, the change is reflected in the import module.
Modules that export values can change these values at any time, but the import module cannot change the values it imports. However, if a module imports an object, it can change the value of the property on that object.
Dynamic binding is used because you can connect all the modules without running any code. This helps in the evaluation of the existence of cyclic dependencies, which I will explain below.
So, at the end of this step, we connected the memory locations of all instances and export/import variables.
Now we can start to evaluate the code and populate those memory locations with their values.
Value evaluation
The final step is to fill in the values in memory. The JS engine does this by executing code outside the top-level code-the function.
In addition to filling in memory, the evaluation code can also cause side effects. For example, a module might request a server.
Because of the potential side effects, you only want to evaluate the module once. Multiple links get the same result for the link process that occurs in the instantiation, but unlike this, the evaluation results may vary depending on the number of evaluation times.
This is one of the reasons why module mappings are required. Module mappings cache modules through canonical URLs, so there is only one module record per module. This ensures that each module is executed only once. Just like instantiation, this is done through depth-first post traversal.
What about the cyclic dependence we've talked about before?
If there is a cyclic dependency, it will eventually produce a loop in the dependency graph. Typically, there is a long loop path. But in order to explain the problem, I intend to use a short cycle of man-made examples.
Let's see how the CommonJS module handles this problem. First, the main module executes to the Require statement. Then it will go to load the counter module.
The counter module then tries to access it from the exported object message
. However, since this is not evaluated in the main module, undefined is returned. The JS engine allocates memory space for local variables and sets the value to undefined.
The evaluation process continues until the end of the counter module's top-level code. We wanted to see if we would eventually get the correct message value (after main.js evaluation), so we set a timeout. Then main.js
continue to evaluate on.
The message variable is initialized and added to memory. However, since there is no connection between the two, it will remain undefined in the counter module.
If you use dynamic binding to process the export, the counter module will eventually see the correct value. When the timeout is run,main.js
The value has been completed and populated.
Supporting these cyclic dependencies is a big reason behind the ES module design. It is this three-stage design that makes it possible.
What is the status of ES modules?
With the release of Firefox 60 in early May, all major browsers support ES modules by default. Node also adds support for aWorking Groupis working to address compatibility issues between CommonJS and ES modules.
This means that you can use it in the script tagtype=module
, and use import and export. However, more module features are not yet implemented.Dynamic Import Proposalis in the 3rd phase of the canonical process, helping to support node. JS Use CasesImport.metaAs well,Module resolution proposalwill also help to flatten the differences between the browser and node. js. So we can look forward to the future of module support will be better.
Acknowledgements
Thank you to all who gave feedback on this article, or who provided information through written and discussion, including Axel Rauschmayer, Bradley Farias, Dave Herman, Domenic DeNicola, Havi Hoffman, Jason Weathersby, JF Bastien, Jon coppeard, Luke Wagner, Myles borins, Till Schneidereit, Tobias koppers and Yehuda Katz, also thank Members of the WebAssembly community group, Node Module Working Group, and TC39.
AboutLin Clark
Lin is an engineer in the Mozilla developer relations Group. She studied JavaScript, WebAssembly, Rust, and Servo, and also drew some code comics.
- Code-cartoons.com
- @linclark
More articles from Lin Clark ...
This article has been authorized by the author NetEase Cloud Community release, without permission.
Original: Comic: The in-depth ES module (the previous article)
Comics: The comprehensible ES module (next article)