Create a Real-Time multiplayer game framework with Node. js

Source: Internet
Author: User
Tags dedicated server
This article describes how to create a real-time multiplayer game framework using Node. js. For more information, see Node. js. Some time ago, the group owner participated in the geek pine activity. During this activity, we intended to make a game that allows the "low-headed family" to communicate more, the core function is the real-time multi-person interaction of the Lan Party concept. The geek song game only took a very short 36 hours and required everything to be agile and fast. Under such a premise, the initial preparation seems a little "ready for use ". For cross-platform application solution, we chose node-webkit, which is simple enough and meets our requirements.

As required, our development can be separated by modules. This article describes how to develop spacoom (our real-time multiplayer game framework), including a series of explorations and attempts, as well as Node. js and WebKit platform restrictions and solutions.

Getting started
Spacerooms
At the very beginning, the design of spacoom must be demand-driven. We hope this framework can provide the following basic functions:

A group of users can be distinguished by room (or channel)
Able to receive commands from users in the Collection Group
The game data can be precisely broadcast between clients according to the specified interval
Minimize the impact of network latency
Of course, in the later stages of coding, we provided more features for spacerooms, including suspending the game, generating consistent random numbers among various clients, etc. (Of course, these can be implemented by yourself in the game logic framework as needed, it is not necessary to use the spacroom framework to work more on the communication layer ).

APIs
Spacerooms are divided into two parts: front-end and back-end. What the server needs to do is to maintain the room list and provide the function of creating and joining the room. Our client APIs looks like this:

Spacoom. connect (address, callback)-connect to the server
Spacoom. creatoom (callback)-create a room
Spacoom. joinRoom (roomId)-join a room
Spacoom. on (event, callback)-listener event
......
After the client connects to the server, it will receive various events. For example, a user in a room may receive events when a new player joins or when the game starts. We assign a "lifecycle" to the client, which is always in the following State:

You can obtain the current status of the client through spacoom. state.

Using the server-side framework is much simpler. If you use the default configuration file, you can directly run the server-side framework. We have a basic requirement: the server code can run directly on the client without a separate server. Players who have played PS or PSP should be clear about what I am talking about. Of course, it is excellent to run on a dedicated server.

The implementation of the logic code is briefly described here. The first generation of spacerooms completes the function of a Socket server, which maintains the room list, including the room status, and game-time communication (command collection, bucket broadcast, etc.) corresponding to each room ). For specific implementation, see the source code.

Synchronization Algorithm

So how can we make the display items between clients consistent in real time?

It sounds interesting. Think about it. What do we need the server to help us deliver? Naturally, we will think about what may cause logical inconsistency between clients: USER commands. Since the code for processing the game logic is the same, the code running results are the same given the same conditions. The only difference is the various player commands received during the game. Of course, we need a way to synchronize these commands. If all clients can get the same command, all clients will theoretically have the same running result.

The synchronization algorithms of online games are a variety of different scenarios. The synchronization algorithm used by spacoom is similar to the frame lock concept. We divide the timeline into intervals. Each interval is called a bucket. A Bucket is used to load commands and maintained by the server. At the end of each bucket period, the server broadcasts the bucket to all clients. After the client obtains the bucket, it extracts the command from it and performs the verification.

To reduce the impact of network latency, each command from the client received by the server is delivered to the corresponding bucket according to a certain algorithm. follow these steps:

Set order_start as the command generation time carried by the command, and t as the start time of the bucket where order_start is located.
If t + delay_time <= start time of the bucket where the command is being collected, ship the command to the bucket where the command is being collected; otherwise, continue step 3.
Ship the command to the bucket corresponding to t + delay_time
Here, delay_time is the agreed server latency time, which can be the average latency between clients. The default value in spacoom is 80, and the default value of bucket length is 48. at the end of each bucket period, the server broadcasts the bucket to all clients and starts to receive commands for the next bucket. When the client is automatically aligned in Logic Based on the received bucket interval, the time error is controlled within an acceptable range.

This means that, under normal circumstances, the client will receive a bucket sent from the server every 48 ms, and the client will perform corresponding processing when it reaches the time when it needs to process the bucket. Assume that the client FPS is 60, and a bucket is received every three frames. The logic is updated based on this bucket. If the bucket is not received after the time limit due to network fluctuations, the client suspends the game logic and waits. Logical updates within a bucket can be implemented using lerp.

If delay_time = 80 and bucket_size = 48, any command will be executed in 96 ms at least. Change these two parameters. For example, if delay_time = 60 and bucket_size = 32, any command will be executed with a delay of at least 64 ms.

Blood cases caused by timer

As a whole, our framework requires a precise timer when running. Execute bucket broadcast under fixed interval. Naturally, we first thought of using setInterval (), but the next second we realized how unreliable this idea is: the naughty setInterval () seems to have a very serious error. What's worse, every error will accumulate, causing more and more serious consequences.

So we immediately thought about using setTimeout (), and dynamically corrected the next time to make our logic roughly stable around the specified interval. For example, if setTimeout () is 5 ms less than expected, we will give him 5 ms in advance next time. However, the test results are not satisfactory, and this is not elegant enough.

So we have to change our mind. Whether setTimeout () can expire as quickly as possible, and then we check whether the current time has reached the target time. For example, in our loop, using setTimeout (callback, 1) to continuously check the time seems like a good idea.

Disappointing Timer
We immediately wrote a piece of code to test our ideas, and the results were disappointing. Run the following code on the latest node. js stable version (v0.10.32) and Windows:

The Code is as follows:


Var sum = 0, count = 0;
Function test (){
Var now = Date. now ();
SetTimeout (function (){
Var diff = Date. now ()-now;
Sum + = diff;
Count ++;
Test ();
});
}
Test ();

After a period of time, enter sum/count in the console and you can see a result, similar:

The Code is as follows:


> Sum/count
15.624555160142348

What ?!! I want an interval of 1 ms, but you tell me the actual average interval is 15.625 ms! This picture is beautiful. We did the same test on mac and the result was 1.4 ms. So we are wondering: What the hell is this? If I am a fruit powder, I may have to come to the conclusion that Windows is too spam and give up Windows. Fortunately, I am a rigorous front-end engineer, so I began to think about this number.

Wait. Why is this number so familiar? Will the number 15.625 ms be too much like the maximum timer interval in Windows? Download a ClockRes file immediately for testing. The following result is displayed when the console is running:

The Code is as follows:


Maximum timer interval: 15.625 MS
Minimum timer interval: 0.500 MS
Current timer interval: 1.001 MS

Sure enough! Refer to the node. js manual to see the following description of setTimeout:

The actual delay depends on external factors like OS timer granularity and system load.
However, the test results show that the actual latency is the maximum timer interval (note that the current timer interval of the system is only 1.001 ms at this time). In any case, it is unacceptable, and a strong curiosity drives us to look over the node. js source code to get a glimpse.

Bugs in Node. js

I believe that most of you and I work on Node. js even loop Mechanism has a certain understanding, view the source code of timer implementation, we can roughly understand the implementation principle of timer, let's start from the main cycle of event loop:

The Code is as follows:


While (r! = 0 & loop-> stop_flag = 0 ){
/* Update global time */
Uv_update_time (loop );
/* Check whether the timer expires and execute the corresponding timer callback */
Uv_process_timers (loop );

/* Call idle callbacks if nothing to do .*/
If (loop-> pending_reqs_tail = NULL &&
Loop-> endgame_handles = NULL ){
/* Prevent event loop from exiting */
Uv_idle_invoke (loop );
}

Uv_process_reqs (loop );
Uv_process_endgames (loop );

Uv_prepare_invoke (loop );

/* Collect IO events */
(* Poll) (loop, loop-> idle_handles = NULL &&
Loop-> pending_reqs_tail = NULL &&
Loop-> endgame_handles = NULL &&
! Loop-> stop_flag &&
(Loop-> active_handles> 0 |
! Ngx_queue_empty (& loop-> active_reqs ))&&
! (Mode & UV_RUN_NOWAIT ));
/* SetImmediate () and so on */
Uv_check_invoke (loop );
R = uv _ loop_alive (loop );
If (mode & (UV_RUN_ONCE | UV_RUN_NOWAIT ))
Break;
}

The source code of the uv_update_time function is as follows: (https://github.com/joyent/libuv/blob/v0.10/src/win/timer.c ))

The Code is as follows:


Void uv_update_time (uv_loop_t * loop ){
/* Obtain the current system time */
DWORD ticks = GetTickCount ();

/* The assumption is made that LARGE_INTEGER.QuadPart has the same type */
/* Loop-> time, which happens to be. Is there any way to assert this? */
LARGE_INTEGER * time = (LARGE_INTEGER *) & loop-> time;

/* If the timer has wrapped, add 1 to it's high-order dword .*/
/* Uv_poll must make sure that the timer can never overflow more */
/* Once between two subsequent uv_update_time CILS .*/
If (ticks <time-> LowPart ){
Time-> HighPart + = 1;
}
Time-> LowPart = ticks;
}

The internal implementation of this function uses the Windows GetTickCount () function to set the current time. Simply put, after calling the setTimeout function, after a series of struggles, the Internal timer-> due will be set to the current loop time + timeout. In event loop, update the current loop time through uv_update_time, and then check whether a timer expires in uv_process_timers. If yes, it enters the JavaScript world. After reading the entire article, event loop is probably like this process:

Update global time

Check the timer. If a timer expires, run the callback.
Check the reqs queue and execute the waiting request
Go to the poll function to collect IO events. If any IO event arrives, add the corresponding processing function to the reqs queue for execution in the next event loop. Inside the poll function, a system method is called to collect IO events. This method will cause the process to be blocked until there is an IO event or the preset timeout time is reached. When this method is called, the timeout value is set to the expiration time of the latest timer. The maximum blocking time is the end time of the next timer.
Source code of one of the poll functions in Windows:

The Code is as follows:


Static void uv_poll (uv_loop_t * loop, int block ){
DWORD bytes, timeout;
ULONG_PTR key;
OVERLAPPED * overlapped;
Uv_req_t * req;
If (block ){
/* Retrieve the expiration time of the latest timer */
Timeout = uv_get_poll_timeout (loop );
} Else {
Timeout = 0;
}
GetQueuedCompletionStatus (loop-> iocp,
& Bytes,
& Key,
& Overlapped,
/* The maximum number of times the next timer expires */
Timeout );
If (overlapped ){
/* Package was dequeued */
Req = uv_overlapped_to_req (overlapped );
/* Insert IO events into the queue */
Uv_insert_pending_req (loop, req );
} Else if (GetLastError ()! = WAIT_TIMEOUT ){
/* Serious error */
Uv_fatal_error (GetLastError (), "GetQueuedCompletionStatus ");
}
}

According to the above steps, assuming we have set a timeout = 1 ms timer, the poll function will block up to 1 ms and resume (if there is no IO event in the period ). When you continue to enter the event loop, uv_update_time will update the time, And uv_process_timers will find that our timer expires and execute a callback. Therefore, the initial analysis is that either the uv_update_time is incorrect (the current time is not correctly updated), or the poll function is restored after 1 ms, and the 1 ms wait is faulty.

After reading MSDN, we can find the description of the GetTickCount function:

The resolution of the GetTickCount function is limited to the resolution of the system timer, which is typically in the range of 10 milliseconds to 16 milliseconds.

The precision of GetTickCount is so rough! Assuming that the poll function blocks the 1 ms time correctly, the current loop time is not correctly updated when uv_update_time is executed next time! So our timer was not determined to expire, so poll waited for 1 ms and entered the next event loop. Until the GetTickCount is updated correctly (the so-called 15.625ms is updated once), the current time of the loop is updated, and our timer is determined to expire in uv_process_timers.

Seek help from WebKit
The source code of Node. js is helpless: it uses a time function with low precision and does not do any processing. However, we immediately thought that since we use Node-WebKit, we also have the setTimeout of Chromium in addition to the setTimeout of Node. js. Write a piece of test code and run it in our browser or Node-WebKit: http://marks.lrednight.com/test.html#1 (# The following number indicates the interval to be measured). The result is as follows:

According to the HTML5 standard, the theoretical result should be 1 ms for the first five times, and 4 ms for the later results. The results shown in the test case start from 3rd times. That is to say, the data in the table should theoretically be 1 ms for the first three times and 4 ms for the subsequent results. The results have certain errors. According to the regulations, the minimum theoretical result we can obtain is 4 ms. Although we are not satisfied, we are much more satisfied with the node. js results. Strong curiosity: Let's look at the Chromium source code to see how it is implemented. Https://chromium.googlesource.com/chromium/src.git/+/38.0.2125.101/base/time/time_win.cc)

First, Chromium uses the timeGetTime () function to determine the current time of the loop. Check MSDN to find that the precision of this function is affected by the system's current timer interval. In our testing machine, theoretically it is 1.001 ms as mentioned above. However, in Windows, timer interval is the maximum value (15.625 ms on the test machine) by default, unless the application modifies the global timer interval.

If you follow the news in the IT industry, you must have read such a news. It seems that our Chromium sets the timer interval very small! It seems that we don't have to worry about the system timer interval problem? Don't be happy too early. Such a fix gives us a great deal. In fact, this problem has been fixed in Chrome 38. Do we need to fix the previous Node-WebKit? This is obviously not elegant enough and prevents us from using a Chromium version with higher performance.

Further looking at the Chromium source code, we can find that when there is a timer and the timer timeout <32 ms, Chromium will change the global timer interval of the system to implement a timer with a precision less than 15.625 ms. (View Source Code) when the timer is started, a things called HighResolutionTimerManager will be enabled. This class will call the EnableHighResolutionTimer function based on the power type of the current device. Specifically, when the current device uses a battery, it will call EnableHighResolutionTimer (false) and pass in true when using the power supply. The EnableHighResolutionTimer function is implemented as follows:

The Code is as follows:


Void Time: EnableHighResolutionTimer (bool enable ){
Base: AutoLock lock (g_high_res_lock.Get ());
If (g_high_res_timer_enabled = enable)
Return;
G_high_res_timer_enabled = enable;
If (! G_high_res_timer_count)
Return;
// Since g_high_res_timer_count! = 0, an ActivateHighResolutionTimer (true)
// Was called which called timeBeginPeriod with g_high_res_timer_enabled
// With a value which is the opposite of | enable |. With that information we
// Call timeEndPeriod with the same value used in timeBeginPeriod and
// Therefore undo the period effect.
If (enable ){
TimeEndPeriod (kMinTimerIntervalLowResMs );
TimeBeginPeriod (kMinTimerIntervalHighResMs );
} Else {
TimeEndPeriod (kMinTimerIntervalHighResMs );
TimeBeginPeriod (kMinTimerIntervalLowResMs );
}
}

KMinTimerIntervalLowResMs = 4 and kMinTimerIntervalHighResMs = 1. TimeBeginPeriod and timeEndPeriod are functions provided by Windows to modify the system timer interval. That is to say, the minimum timer interval we can get during power supply is 1 ms, while the battery is 4 ms. As our cycle constantly calls setTimeout, according to W3C specifications, the minimum interval is 4 ms, so it is easy to say that this has little impact on us.

Another precision Problem

Back to the beginning, we found that the test results showed that the setTimeout interval was not stable at 4 ms, but constantly fluctuating. The results of the http://marks.lrednight.com/test.html#48 test also showed that the interval between 48 ms and 49 ms beat. The reason is that in the event loop of Chromium and Node. js, the accuracy of the Windows function that waits for the IO event is affected by the timer of the current system. The requestAnimationFrame function (constantly updating the canvas) is required for game logic implementation. This function can help us set the timer interval to at least kMinTimerIntervalLowResMs (because he needs a 16 Ms timer, triggers the high-precision timer ). When the test machine uses power, the timer interval of the system is 1 ms, so the test result has an error of ± 1 ms. If your computer has not been changed to the system timer interval, run the above #48 test, max may reach 48 + 16 = 64 ms.

Using setTimeout of Chromium, we can control the error of setTimeout (fn, 1) to about 4 ms, while the error of setTimeout (fn, 48) can be controlled to about 1 ms. So we have a new blueprint in our hearts, which makes our code look like this:

The Code is as follows:


/* Get the max interval deviation */
Var deviation = getMaxIntervalDeviation (bucketSize); // bucketSsize = 48, deviation = 2;
Function gameLoop (){
Var now = Date. now ();
If (previusbucket + bucketSize <= now ){
Previusbucket = now;
DoLogic ();
}
If (previusbucket + bucketSize-Date. now ()> deviation ){
// Wait 46 ms. The actual delay is less than 48 ms.
SetTimeout (gameLoop, bucketSize-deviation );
} Else {
// Busy waiting. Use setImmediate instead of process. nextTick because the former does not block IO events.
SetImmediate (gameLoop );
}
}

The code above allows us to wait for a time with an error less than bucket_size (bucket_size-deviation), instead of directly equal to a bucket_size, 46 ms of delay, even if the maximum error occurs, according to the above theory, the actual interval is less than 48 Ms. For the rest of the time, we use the busy wait method to ensure that our gameLoop is executed with sufficient precision interval.

Although we solved the problem to some extent using Chromium, it is obviously not elegant enough.

Do you remember our initial requirements? Our server code should be able to run directly on a computer with a Node. js environment without the Node-Webkit client. If you run the preceding Code directly, the value of deviation is at least 16 ms. That is to say, in every 48 ms, we have to wait for 16 ms. CPU usage immediately goes up.

Unexpected surprises

I'm so angry. Isn't it noticed that such a big BUG in Node. js? The answer is truly a surprise to us. This BUG has been fixed in v.0.11.3. You can directly view the master branch of the libuv code to view the modified result. The specific method is to add a timeout to the current loop time after the poll function waits for completion. In this way, even if GetTickCount does not respond, after waiting for poll, we still add the waiting time. So the timer will expire smoothly.

That is to say, the long-time problem has been solved in v.0.11.3. However, our efforts are not in vain. Even if the GetTickCount function is eliminated, the poll function is also affected by the system timer. One solution is to compile the Node. js plug-in to change the system timer interval.

However, in this game, we initially set that there is no server. After the client creates a room, it becomes a server. The server code can run in the Node-WebKit environment, so the timer issue in Windows is not of the highest priority. According to the solution we provided above, the results are sufficient to satisfy us.

Ending

After the timer problem is solved, the implementation of our framework is basically no obstacle. We provide support for WebSocket (in a pure HTML5 environment) and custom communication protocols for higher performance Socket support (in a Node-WebKit environment ). Of course, the functions of spacoom were relatively simple at the beginning, but with the increase of demand and time, we are gradually improving this framework.

For example, when we find that we need to generate a consistent random number in our game, we add this feature to spacoom. At the beginning of the game, spacerooms distribute random number seeds. The spacerooms on the client provide a method to generate random numbers using random number seeds Based on md5 randomness.

It looks quite gratifying. I also learned a lot in the process of writing such a framework. If you are interested in spacoom, you can also participate in it. It is believed that spacerooms will be used in more places.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.