Ensures the stable running of the Socket server by monitoring the thread status
The socket server used in the cloud platform is a socket service that we have defined a set of communication protocols and implemented through C.
This service is currently running in the IIS container together with the web service and listens to the port by starting a new thread that never exits.
At the early stage of development, some messages are not captured due to exceptions in the service, such as incorrect message formats sent from the client, and attempts to close a released connection, the listening thread exits unexpectedly.
Later, with the use of the system, these problems were fixed one by one, and the socket service became much more stable. However, after more than a week, the socket service will occasionally fail, no system exception is found when you view the system logs. I checked some information about IIS on the Internet and found that IIS has an intelligent process recovery mechanism to improve the server performance, sessions, cache, and running threads in the memory are cleared during process recycling. Therefore, IIS is used as the server. To ensure the long-term availability of session, cache, and other resources, they should be stored in the database or distributed to other servers. After the process is recycled, IIS starts a new thread, and the ports originally deployed in the IIS Site will be re-monitored. However, IIS will not be started for those threads that the user previously started.
Someone on the Internet provides a solution to configure IIS7:
Reclaim -- change the fixed interval (in minutes) to 0.
-- Change the virtual/dedicated memory limit (KB) to 0.
Process Model -- idle timeout (minutes) changed to 0
This method will disable IIS process recycling, but this may cause server performance degradation after a long running. Moreover, after many attempts to perform such configuration, after a long period of running, IIS will still recycle the process.
After IIS recycles the process, it restarts its listening on the port of the site running on it. We can also run a service to determine whether the current running status of the socket server thread is normal, if not, restart the service. This service must be running outside of IIS.
The specific method is:
The web Service provides an interface for obtaining the Process status.
/SocketServer. ashx? Action = getThreadStatus
Provides an interface for restarting the socket service.
/SocketServer. ashx? Action = startSocketServer
Start a service outside IIS by using other methods, and access the interface that obtains the Process status every 10 seconds. If it is abnormal, call the interface that restarts the socket service.
Now we start a Nodejs service:
// This service is used to monitor the socket service process of the cloud platform. If the process crashes or restarts, restart the socket service, ws service, and task timeout detection.
Var http = require ('http ');
Var moment = require ('moment ')
// Var host = "http: // xxx"; // local debugging
Var host = "http: // xxxxxx"; // Intranet Service
// Var host = "http: // xxxx"; // Public Network Service
Var statusCheck = "xxx ";
Var startSocket = "xxx ";
Var startWs = "xxx ";
Var taskTimeout = "xxx ";
Var inteval;
Function start (){
Inteval = setInterval (checkStatus, 20000 );
}
Function end (){
ClearInterval (inteval );
}
Start ();
Function checkStatus (){
Try {
Http. get (host + statusCheck, function (res ){
Res. on ('data', function (data ){
Var socketStatus = JSON. parse (data. toString ());
If (socketStatus. socketServer = 'hung' | socketStatus. socketServer = 'stopped '){
Console. log (moment (new Date (). format ('yyyy-MM-DD HH: mm: ss') + "socket service unavailable, restarting ")
// Restart the service
RestartService ();
}
})
}). On ('error', function (e ){
Console. log (moment (new Date (). format ('yyyy-MM-DD HH: mm: ss') + "error:" + e. message );
});
}
Catch (e ){
Console. log (e. message );
}
}
Function restartService (){
// End ();
Http. get (host + startSocket, function (res ){
StatusCode (res. statusCode, 'startsocket ');
Console. log (moment (new Date (). format ('yyyy-MM-DD HH: mm: ss') + "restart socketserver" + res. statusCode );
Res. resume ();
});
Http. get (host + startWs, function (res ){
StatusCode (res. statusCode, 'startws ');
Console. log (moment (new Date (). format ('yyyy-MM-DD HH: mm: ss') + "restart wsserver" + res. statusCode );
Res. resume ();
});
Http. get (host + taskTimeout, function (res ){
StatusCode (res. statusCode, 'tasktimeout ');
Console. log (moment (new Date (). format ('yyyy-MM-DD HH: mm: ss') + "restart task status monitoring" + res. statusCode );
Res. resume ();
});
Var status = {startSocket: false, startWs: false, taskTimeout: false };
Function statusCode (code, name ){
If (code = 200 ){
Status [name] = true;
}
If (status. startSocket & status. startWs & status. taskTimeout ){
// Start ();
}
}
}
This practice has two drawbacks:
1. Each time the IIS process is recycled, the socket service will be unavailable for several seconds.
2. the socket service runs on the web server, which is not conducive to the expansion of the web server or socket server in the future. The device connected to server A cannot be accessed by server B.
The following improvements will be made:
Separate the socket server and redesign the communication method between the Web server and the socket server.
In this way, the socket service is not affected by the configuration of the IIS server, and the web server and socket server can be expanded at will.
This article permanently updates the link address: