Series 3| into the multi-process model of node. js

Source: Internet
Author: User

Text: Are Dragon (Hujiang Web front-end engineer)

This article original, reproduced please indicate the author and source

In the previous article, "go into the HTTP implementation analysis of node. js," You already know how node. JS handles HTTP requests, and throughout the process, it only uses a single-process model. So how do you extend your WEB application to a multi-process model to take advantage of CPU resources? The answer is Cluster. This article will take you through an analysis of node. JS's multi-process model.

First, a classic node. JS master-Slave Service Model code:

ConstCluster= require(' cluster ');ConstNumcpus= require(' OS ').CPUs().length;if(Cluster.IsMaster){   for( LetI= 0;I<Numcpus;I++){    Cluster.Fork();  }} Else {  require(' http ').Createserver((req,Res= {    Res.End(' Hello World ');  }).Listen(3333);}

Typically, the master-slave model consists of a master process (master) and multiple Slave processes (workers), the main process is responsible for receiving connection requests, and the individual request tasks are distributed to the processing from the process; the responsibility of the process is to constantly respond to the client request until it enters the wait state. 3-1 is shown below:

Around this piece of code, this article wants to tell you a few key questions:

    1. From the process creation process;

    2. With the same host address, if the specified port is already listening, the other process attempts to listen to the same port when it should have an error (Eaddrinuse, that is, the port is occupied); How can node. js execute the Listen method on the same ports on the master-slave process?

How is the process fork done?

In node. js, Cluster.fork is slightly different from POSIX's fork: Although it is still created from the process, it does not directly use the process image of the main process, but instead calls the system function EXECVP to use the new process image from the process. In addition, each slave process corresponds to a Worker object, which has the following states: None, online, listening, dead, and disconnected.

The Childprocess object primarily provides process creation (spawn), destruction (kill), and process handle reference count management (ref and UNREF). In addition to encapsulating the process object (process_wrap.cc), it also handles some detail issues itself. For example, in method spawn, if an IPC pipeline is required between master and slave processes, the environment variable NODE_CHANNEL_FD is used to inform the IPC-related file descriptor (FD) that should be bound from the process, and this particular environment variable will be referred to again later.

The three object reference relationships mentioned above are as follows:

Cluster.fork's main implementation process:

    1. Call Child_process.spawn;

    2. Creates a Childprocess object and initializes its _handle property to the Process object, which is the object advertised to JavaScript in process_wrap.cc, which encapsulates the process manipulation capabilities of LIBUV. Attach the C + + definition of the Process object:

c++ interface Process { construtor(const FunctionCallbackInfo<Value>& args); void close(const FunctionCallbackInfo<Value>& args); void spawn(const FunctionCallbackInfo<Value>& args); void kill(const FunctionCallbackInfo<Value>& args); void ref(const FunctionCallbackInfo<Value>& args); void unref(const FunctionCallbackInfo<Value>& args); void hasRef(const FunctionCallbackInfo<Value>& args); }

    1. Call the Childprocess._handle method spawn, and eventually call Uv_spawn in the LIBUV library.

When the main process executes cluster.fork, two special environment variables NODE_CHANNEL_FD and node_unique_id are specified, so the initialization process from the process is slightly different from the normal node. JS Process:

    1. Bootstrap_node.js is the JavaScript portal file that is included in the runtime, which calls Internal\process.setupchannel;

    2. If the environment variable contains NODE_CHANNEL_FD, call Child_process._forkchild, and then remove the value;

    3. Call Internal\child_process.setupchannel, listen for message internalmessage on the child process's global process object, and Add methods send and _send. Where send is just the encapsulation of _send; Typically, _send simply serializes the message JSON and then writes it to the pipeline and eventually to the receiving end.

    4. If the environment variable contains node_unique_id, the current process is worker mode, and Workerinit is executed when the cluster module is loaded, and it also affects net. The Listen method of the Server, the Listen method in worker mode, invokes Cluster._getserver, which essentially initiates the message {"act": "Queryserver"} to the main process instead of actually listening to the port.

IPC Implementation Details

As mentioned above, the node. JS Master-Slave process maintains contact only through the IPC, and this section provides an in-depth analysis of the implementation details of the IPC. First, let's look at a sample code:

1-master.js

Const {Spawn} = require(' child_process '); LetChild= Spawn(Process.Execpath,[`${__dirname}/1-slave.js '], {  stdio:[0, 1, 2, ' IPC ']}); Child. on(' message ', function(data){  Console.Log(' received in master: ');  Console.Log(data);}); Child.Send({  msg: ' msg from master '});

1-slave.js

process.on(‘message‘,function{  console.log(‘received in slave:‘);  console.log(data);});process.send({  ‘msg‘:‘message from slave‘});
node 1-master.js

The results of the operation are as follows:

The attentive classmate may find that the console output is not sequential, and the logs of Master and slave are interleaved, due to the unpredictable sequence of parallel process execution.

Socketpair

The previous article mentions that starting a new node. JS instance from the process actually calls EXECVP through the system, i.e., by default, the node. JS master/slave process does not share the file descriptor, so how do they send messages to each other?

Originally, you can use Socketpair to create a pair of full-duplex anonymous sockets for inter-process messages, the function signature is as follows:

int socketpair(intintintint sv[2]);

Usually, we cannot pass the file descriptor through the socket, when the main process and the client establish a connection, need to tell the connection descriptor processing from the process, what to do? In fact, by specifying the first parameter of Socketpair as Af_unix, the anonymous UNIX domain socket is created, which allows you to use system functions sendmsg and recvmsg to pass/receive file descriptors.

When the main process calls Cluster.fork, the relevant process is as follows:

    1. Creates a Pipe (pipe_wrap.cc) object and specifies that the parameter IPC is true;
    2. Call the Uv_spawn,options parameter to the uv_process_options_s structure, and store the Pipe object in the property stdio of the structure;
    3. Call Uv__process_init_stdio, create a full-duplex socket via Socketpair;
    4. Call Uv__process_open_stream to set the IOWATCHER.FD value of the Pipe object to one of the full-duplex sockets.

Thus, the master-slave process can communicate in two directions. The flowchart is as follows:

Let's look back at the environment variable NODE_CHANNEL_FD, which is confusing, it always has a value of 3. In the process-level file descriptor table, 0-2 is the standard input stdin, the standard output stdout, and the standard error output stderr, then the first file descriptor available is 3,socketpair obviously consumes the first available file descriptor from the process. Thus, when the data is written from the process toward the fd=3 stream, the main process can receive the message;

Reading messages from the IPC is primarily a flow operation, and later there is an opportunity to explain the main processes listed below:

    1. Streambase::editdata callback OnRead;

    2. Streamwrap::onreadimpl call Streamwrap::editdata;

    3. The Streamwrap constructor invokes the SET_READ_CB settings Onreadimpl;

    4. STREAMWRAP::SET_READ_CB setting property streamwrap::read_cb_;

    5. Reference attribute read_cb_ in Streamwrap::onread;

    6. Streamwrap::readstart calls Uv_read_start when Streamwrap::onread is passed as a 3rd argument:

int uv_read_start(uv_stream_t* stream, uv_alloc_cb alloc_cb, uv_read_cb read_cb)

The class diagram relationships involved are as follows:

Server Master-Slave model

The above is an analysis of the process of creation and its particularity; If you want to implement a master-slave service model, you need to solve a basic problem: how to get the connection descriptor from the process to the client? We are going to start with the Process.send (only the Send method is available on the global process object from the processes, and the main process can access the method by worker.process or the worker) on the function signature:

voidsend(message, sendHandle, callback)

The parameter message and callback meanings may be obvious, respectively, referring to the message object to be sent and the callback function after the end of the operation. What is its second parameter, Sendhandle use?

The previous article mentioned that the system function Socketpair can create a pair of two-way sockets, can be used to send JSON messages, this one is mainly related to flow operations, in addition, when the Sendhandle has values, they can also be used to pass file descriptors, the process is relatively complex, But eventually the system functions sendmsg and recvmsg are called.

Pass the connection descriptor to the client

Under the master-slave service model, the main process is responsible for establishing a connection with the client and then passing the connection descriptor through sendmsg to the slave process. Let's take a look at this process:

From the process

    1. Call HTTP. Server.listen method (Inherit to net. Server);

    2. Call Cluster._getserver to initiate a message to the main process:

json { "cmd": "NODE_HANDLE", "msg": { "act": "queryServer" } }
Main process

    1. When the message is received, a new Roundrobinhandle object is created and the variable is handle. Each handle corresponds to a connection endpoint and corresponds to multiple instances from the process, and it opens the TCP service socket corresponding to the connection endpoint.

"JS
Class Roundrobinhandle {
Construtor (key, address, port, AddressType, FD) {
Listens to the same end point from the process collection
This.all = [];

  // 可用的从进程集合  this.free = [];  // 当前等待处理的客户端连接描述符集合  this.handles = [];  // 指定端点的TCP服务socket  this.server = null;}add(worker, send) {  // 把从进程实例加入this.all}remove(worker) {  // 移除指定从进程}distribute(err, handle) {  // 把连接描述符handle存入this.handles,并指派一个可用的从进程实例开始处理连接请求}handoff(worker) {  // 从this.handles中取出一个待处理的连接描述符,并向从进程发起消息  // {  //  "type": "NODE_HANDLE",  //  "msg": {  //    "act": "newconn",  //  }  // }}

}
```

    1. Call the Handle.add method to add the worker object to the Handle.all collection;

    2. When Handle.server starts listening for client requests, resetting its OnConnection callback function is Roundrobinhandle.distribute so that the master process does not actually handle the client connection, as long as the distribution connection is processed from the process. It stores the connection descriptor in the Handle.handles collection and sends the message {"act": "Newconn"} when there is a worker available. If the assigned worker does not reply to the acknowledgement message {"ACK": Message.seq, Accepted:true}, it will attempt to assign the connection to another worker.

The flowchart is as follows:

Calling listen from the process

Client Connection Processing

How does the process listen to the same port as the main process?

The main reasons are two points:

* * The initialization of the node. JS runtime from the process is slightly different * *

    1. Because the environment variable node_unique_id is present from the process, the Workerinit method is executed when the cluster module is loaded in Bootstrap_node.js. The difference between this place and the main process masterinit is that there is no Cluster.fork method on the process, so it is not possible to continue creating descendant processes from the process, and the methods on the Worker object disconnect and destroy The implementation is also different: we call Worker.destroy as an example, in the main process, can not directly kill from the process, but the notification from the process exit, and then remove it from the collection, when on the process, from the process of notification of the main process and then quit it, and three, from the process cluster The module has new method _getserver, which is used to initiate the message {"act": "Queryserver"} to the main process, notifies the main process to create the Roundrobinhandle object, and actually listens for the specified port address, and then continues execution with a simulated TCP descriptor;

    2. Call the Cluster._setupworker method, mainly initialize the Cluster.worker property, and listen to the message internalmessage, processing two types of messages: Newconn and disconnect;

    3. Initiating a message to the main process {"act": "Online"};

    4. Because there is node_channel_fd from the process-level environment variable, when Internal\process.setupchannel is called, it is connected to the two-way socket created by the system function Socketpair and listens Internalmessage , processing message types: Node_handle_ack and Node_handle.

* * II. The Listen method executes in a slightly different code in the master-slave process. **

In net. Server (Net.js) method Listen, if it is the main process, the standard port binding process is performed, and if it is from a process, cluster._getserver is called, see the description of the method above.

Finally, attach a C version of the Master-slave service model based on the LIBUV implementation, GitHub address.

After you start the server, Access http://localhost:3333 results in the following:

I believe that through the introduction of this article, we have a comprehensive understanding of node. JS Cluster. The next time the author will be with you in-depth analysis of node. JS Process Management availability issues in the production environment, please look forward to.

Related articles

Analysis of the start-up process of series 1| into node. js

Series 2| into the HTTP implementation of node. js

Recommendation: Translation Project Master's Readme: 1. Dry Goods | Everyone is a Master2 of translation projects. Ikcamp produced small program teaching a total of 5 chapters 16 Bar summary (including video) 3. Start a free serial ~ 2 more 11 lessons per week Ikcamp | Build node. JS Combat Project based on KOA2 (video included) | Introduction of course Outline

Series 3| into the multi-process model of node. js

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.