Review node's built-in cluster module
Node.js's inherent single-threaded model is often considered to be a weakness. No matter how many CPU cores you have on your machine, node.js is just one of them (some operations are conditionally unloaded into the thread pool.) Most programs only share a slice of the CPU's total time, so making better use of the available processing power doesn't make much difference.
So Node.js started with the v0.8, adding a built-in ' cluster ' module. You can use the cluster module to set up a master process as a manager, with one or more worker processes accomplishing the actual work.
It is one of the goals to make it easier to create a multiple-process server that is "forgotten". In a perfect world, you should be able to make an existing single process program reproduce any number of worker processes without having to change one line of code.
Of course, things can't be that easy, but for programs that don't have shared status, or have less shared status, or have shared state in an external resource such as a database or Web service, the cluster module does make it straightforward. It usually takes only a few lines of code to turn that program into a cluster:
var cluster = require (' cluster ');
var OS = require (' OS ');
if (cluster.ismaster)
//Reproduction worker process, the number is the same as the number of CPUs in the system for
(var i = 0, n = Os.cpus (). length; i < n; i + 1)
cluster . fork ();
else
/Start Program
app ();
This program does not need to know that it is running in a clustered environment. Say you have one of these app()
:
var http = require (' http ');
Function app () {
var server = http.createserver (function (req, res) {
res.end (' OK ');
});
Server.listen (8080, ' www.example.com ');
}
Back to the column page: http://www.bianceng.cnhttp://www.bianceng.cn/webkf/script/
The most amazing thing about the cluster module is that all worker threads can be bound to the same request processing port and address. In addition, it ensures that incoming connections are evenly distributed to the monitored worker threads ... At least that's the theory.
The algorithm for allocating connections in Node.js v0.8 and v0.10 is simple. When the worker process calls http.Server#listen()
or net.Server#listen()
, Node.js sends a message to the main process to create a server socket, bind it, and share it with the worker process. If you already have a bound socket, the main process skips the "Create and bind" step and simply shares the existing socket.
That means all worker processes are listening to the same socket. When a new connection comes in, the operating system wakes up a worker process. The awakened worker process will accept the connection and start providing the service.
Everything's all right. The operating system collects a large number of metrics for the running process, so it should be the most qualified to decide which process to wake up.
Using the reality test theory
Now, we are entering the link between the theory and the messy reality, as it slowly comes to the bottom that the operating system cannot always be ' best ' as programmers think. In particular, in some cases, we have observed – especially in Linux and Solaris – most connections eventually fall into two or three processes.
This is understandable from an operating system point of view: Context switching (suspending a process and then reactivating another) is a pretty expensive operation. If you have n processes all waiting on the same socket, it would be wise to wake up the recently blocked process, as that would maximize the avoidance of context switching. (Of course, the dispatcher is a complex and changeable beast; it's just a general explanation of the real situation.) The basic premise is that those who receive preferential treatment will still be treated favourably.
Not all programs will be affected by this quirk, and most of them will not, but those that are really affected will have very uneven loads.
Once the root cause is identified, the mitigation measures can be used. But it's not particularly satisfying. For example, it's a bit of a good time to give up listening to the socket so that other workers can get a chance to accept a new connection, but not enough. The number of connections in ' selected ' has dropped from 90% to 60-70%, improving, but not good enough. Not to mention its dramatic impact on programs that deal with very short-lived connections.
More importantly, we are clearly aware that, like the generation of random numbers, the allocation of access connections is too important to rely on luck. After many discussions, we have reached a consensus-our last and best hope is to simply discard the current practice and switch to something completely different. This is why the cluster module in Node.js v0.11.2 was replaced with the Round-robin method, and the new connection was accepted by the main process, which then chose a worker process to hand over the connection.
Now the algorithm for selecting worker processes is not particularly ingenious. Like its name, it uses a rotary method-just picks up the next available worker process-but is tested by core developers and users to prove it works: connections are evenly distributed among worker processes. We are considering turning the selected algorithm into something that can be configured or inserted by developers.
If you also want to use the old way to assign a connection, you can set it in the program cluster.schedulingPolicy
:
var cluster = require (' cluster ');
Set this cluster.schedulingpolicy = cluster before calling another cluster function
. Sched_none;
Cluster.fork ();
or adjust the scheduling strategy through environment variables NODE_CLUSTER_SCHED_POLICY
:
$ export node_cluster_sched_policy= "None" # "RR" is Round-robin
$ NODE app.js
If you don't want to affect your shell environment, put it in one line of commands:
$ env node_cluster_sched_policy= "None" NODE app.js
Considerations on Windows
MS Windows is the only platform for using the old way by default. In order to achieve optimal performance, Node.js uses IOCP on Windows. While this is good in most cases, sending handle objects (connections) to other processes is costly. Although it is possible to solve this problem in libuv, it is not clear whether it is really necessary to do this: Windows ports are rarely affected by load-balancing issues, and Linux and Solaris ports are indeed affected.