Node.js make real-time multiplayer game frame _node.js

Source: Internet
Author: User
Tags current time diff setinterval dedicated server cpu usage

We can use it to do all kinds of things today in Node.js. Some time ago, the owner participated in the extreme relaxation activities, in this event we are intended to make a "bow family" can more exchanges of games, the core function is Lan party concept of real-time multiplayer interaction. The Geek Pine game was only 36 hours short, requiring everything to be quick and swift. In such a premise the initial preparations appear to be somewhat "ripe". Solution for Cross-platform Applications we chose Node-webkit, which is simple enough and meets our requirements.

According to the requirement, our development can be carried out separately according to the module. This article specifically describes the process of developing spaceroom (our real-time multiplayer game framework), including a series of explorations and attempts, as well as some limitations on the node.js, WebKit platform itself, and the solution proposal.

Getting Started
A glimpse of Spaceroom
In the beginning, Spaceroom's design must be demand-driven. We want this framework to provide the following basic functionality:

Ability to differentiate a group of users in a room (or channel)
Able to receive instructions from users within the collection group
The game data can be accurately broadcast in accordance with the specified interval when the client is right
Can eliminate the impact of network latency as much as possible
Of course, at the end of the coding, we provide more functionality for Spaceroom, including pausing the game, generating consistent random numbers among the clients (depending on the requirements, which can be achieved within the logic framework of the game, not necessarily using the Spaceroom This is more a framework for working at the communication level.

APIs
The spaceroom is divided into two parts of the front and back. Server-side work involves maintaining a room list, providing the ability to create a room and join a room. Our client APIs look like this:

Spaceroom.connect (Address, callback) – Connecting servers
Spaceroom.createroom (callback) – Create a room
Spaceroom.joinroom (Roomid) – Join a room
Spaceroom.on (event, callback) – Monitoring events
......
After the client connects to the server, it receives a variety of events. For example, a user in a room may receive events from a new player or start a game. We give the client a "lifecycle" that he will be in at any given time in the following states:

You can get the current state of the client by Spaceroom.state.

Using a server-side framework is a lot simpler, and if you use the default configuration file, it's OK to run the server-side framework directly. We have a basic requirement: The server code can run directly on the client without needing a separate server. Players who have played PS or PSP should know what I'm talking about. Of course, can run in a dedicated server, nature is also excellent.

The implementation of the logical code is abbreviated here. The spaceroom completes the function of a Socket server, which maintains a list of rooms, including the status of the room, as well as communication for each room corresponding to the game (instruction collection, bucket broadcast, etc.). Specific implementation can refer to the source code.

Synchronization algorithm

So how do you make everything that's displayed between clients consistent in real time?

This thing sounds very interesting. Think about it, we need the server to help us deliver something? It's natural to think about what might be the logical inconsistency between clients: User directives. Since the code that handles the game logic is the same, then given the same condition, the code will run the same result. The only difference is in the game process, received a variety of player instructions. Of course, we need a way to synchronize these instructions. If all the clients can get the same instruction, then all the clients will theoretically have the same running result.

Network game synchronization algorithm strange, the application of the scene are also different. The synchronization algorithm adopted by Spaceroom is similar to the concept of frame locking. We divide the timeline into a single interval, each of which is called a bucket. Bucket is used to load instructions, which are maintained by the server side. At the end of each bucket time period, the server broadcasts bucket to all clients, and the client gets the bucket and then takes out the instructions and performs the validation.

In order to reduce the impact of network latency, the server received the instructions from the client each will be in accordance with a certain algorithm posted to the corresponding bucket, in accordance with the following steps:

Set Order_start for instructions to carry the occurrence of the time, T is the order_start where bucket start time
If the T + delay_time <= is currently collecting the start time of the bucket of the instruction, the instruction is posted to the bucket that is currently collecting the instruction, otherwise proceed to step 3
Post the instruction to the T + delay_time corresponding bucket
Where Delay_time is the agreed server delay time, it can be taken as the average delay between the clients, Spaceroom in the default value of 80, and bucket length default value of 48. At the end of each bucket time period, the server broadcasts this bucket to all clients and begins receiving instructions for the next bucket. The client controls the time error within an acceptable range, based on the bucket interval received and automatically in logic.

This means that, under normal circumstances, the client will receive a bucket from the server every 48ms, and the client will handle it when it reaches the time when it needs to handle the bucket. Suppose that the client fps=60, every 3 frames around the time, will receive a bucket, according to this bucket to update the logic. If you have not received bucket after time due to network fluctuations, the client suspends the game logic and waits. In a bucket time, a logical update can use the Lerp method.

In the case of delay_time = Bucket_size = 48, any instruction will be delayed by at least 96ms execution. Change these two parameters, for example, in the case of delay_time = Bucket_size = 32, any instruction will be delayed by at least 64ms execution.

The murders caused by the timer

The whole look down, our framework needs to have a precise timer when it runs. Perform bucket broadcasts under a fixed interval. Naturally, we first thought about using setinterval (), but the next second we realized how unreliable the idea was: Naughty setinterval () seemed to have very serious errors. What's more, every error will accumulate, causing more and more serious consequences.

So we immediately think of using settimeout (), by dynamically correcting the next time to the times to let our logic roughly stable in the prescribed interval or so. For example, the settimeout () 5ms less than expected, then we let him advance 5ms next time. But the test results were unsatisfactory, and it was not elegant enough.

So we have another idea. Whether you can get settimeout () to expire as soon as possible, and then we check whether the current time has reached the target time. For example, in our cycle, it looks like a good idea to use SetTimeout (callback, 1) to keep checking the time.

Disappointing timers.
We immediately wrote a code to test our ideas, and the result was disappointing. Run such a piece of code under the current Node.js stable (v0.10.32) and Windows platforms:

Copy Code code as follows:

var sum = 0, count = 0;
function Test () {
var now = Date.now ();
settimeout (function () {
var diff = Date.now ()-now;
sum + + diff;
count++;
Test ();
});
}
Test ();

After a period of time in the console input sum/count, you can see a result, similar to:

Copy Code code as follows:

> Sum/count
15.624555160142348

What the?!! I want 1ms intervals, you tell me the actual average interval is 15.625ms! This picture is simply too beautiful. We did the same test on the Mac and the result was 1.4ms. So we were wondering: what the hell is this? If I were a fruit powder, I might have to come up with Windows too garbage and then give up windows, but luckily I was a rigorous front-end engineer, so I started to think about that number.

Wait, why is this number so familiar? 15.625MS will this number be too much like the maximum timer interval under Windows? Immediately downloaded a clockres for testing, the console ran the following results:

Copy Code code as follows:

Maximum Timer interval:15.625 ms
Minimum Timer interval:0.500 ms
Current Timer interval:1.001 ms

Sure Consult the Node.js manual we can see a description of the settimeout:

The actual delay depends on external factors like OS timer granularity and system load.
However, the test results show that the actual delay is the maximum timer interval (note that at this time the system's current timer interval is only 1.001ms), in any case unacceptable, powerful curiosity to drive us to look at the source of Node.js to see.

Bugs in the Node.js

I believe most of you and I have a certain understanding of the even loop mechanism of node.js, see the source Code of timer Implementation we can get a rough idea of how the timer works, let's start with the main loop of event loop:

Copy Code code as follows:



while (r!= 0 &amp;&amp; loop-&gt;stop_flag = 0) {


/* UPDATE Global time * *


Uv_update_time (loop);


/* Check whether the timer expires and perform the corresponding timer callback.


Uv_process_timers (loop);





/* Call idle callbacks if does. */


if (Loop-&gt;pending_reqs_tail = = NULL &amp;&amp;


Loop-&gt;endgame_handles = = NULL) {


/* Prevent event loop from exiting * *


Uv_idle_invoke (loop);


}





Uv_process_reqs (loop);


Uv_process_endgames (loop);





Uv_prepare_invoke (loop);





/* Collect IO Events * *


(*poll) (loop, Loop-&gt;idle_handles = = NULL &amp;&amp;


Loop-&gt;pending_reqs_tail = = NULL &amp;&amp;


Loop-&gt;endgame_handles = = NULL &amp;&amp;


!loop-&gt;stop_flag &amp;&amp;


(Loop-&gt;active_handles &gt; 0 | |


!ngx_queue_empty (&AMP;LOOP-&GT;ACTIVE_REQS)) &amp;&amp;


! (Mode &amp; uv_run_nowait));


/* Setimmediate () and so on * *


Uv_check_invoke (loop);


R = uv__loop_alive (loop);


if (Mode &amp; Uv_run_once | uv_run_nowait))


Break


}


The source code of Uv_update_time function is as follows: (HTTPS://GITHUB.COM/JOYENT/LIBUV/BLOB/V0.10/SRC/WIN/TIMER.C))

Copy Code code as follows:

void Uv_update_time (uv_loop_t* loop) {
/* Get current system time * *
DWORD ticks = GetTickCount ();

/* The assumption is made that large_integer. QuadPart has the same type * *
/* Loop->time, which happens to be. Is there no way to assert this? */
large_integer* time = (large_integer*) &loop->time;

/* If The timer has wrapped, add 1 to it ' s High-order DWORD. */
/* Uv_poll must make sure this timer can never overflow more than * *
/* Once between two subsequent uv_update_time calls. */
if (Ticks < Time->lowpart) {
Time->highpart + 1;
}
Time->lowpart = ticks;
}

The internal implementation of the function, using the Windows GetTickCount () function to set the current time. Simply put, after calling the SetTimeout function, after a series of struggles, the internal timer->due is set to the current loop's Time + timeout. In the event loop, you update the current loop's time by Uv_update_time, and then in Uv_process_timers, check to see if a timer expires, and if so, enter the world of JavaScript. Read throughout, the event loop is probably one such process:

Update Global Time

Check the timer, if a timer expires, perform a callback
Check the reqs queue to execute the waiting request
Enter the poll function to collect IO events and, if an IO event arrives, add the corresponding handler function to the Reqs queue for execution in the next event loop. Inside the poll function, a system method is invoked to collect IO events. This method blocks the process until an IO event arrives or a set timeout is reached. When this method is invoked, the timeout time is set to the time when the nearest timer expires. This means blocking the collection of IO events, and the maximum blocking time is the end time of the next timer.
The source code for one of the poll functions under Windows:

Copy Code code as follows:



static void Uv_poll (uv_loop_t* loop, int block) {


DWORD bytes, timeout;


ULONG_PTR Key;


overlapped* overlapped;


uv_req_t* req;


if (block) {


/* Remove the expiration time of the nearest timer. *


Timeout = Uv_get_poll_timeout (loop);


} else {


Timeout = 0;


}


GetQueuedCompletionStatus (LOOP-&GT;IOCP,


&amp;bytes,


&amp;key,


&amp;overlapped,


///maximum block to next timer expiration * *


Timeout);


if (overlapped) {


/* Package was dequeued * *


req = Uv_overlapped_to_req (overlapped);


* * Insert IO events into the queue.


Uv_insert_pending_req (loop, req);


else if (GetLastError ()!= wait_timeout) {


/* Serious Error * *


Uv_fatal_error (GetLastError (), "GetQueuedCompletionStatus");


}


}


By following these steps, suppose we set a timer with a timeout = 1ms, and the poll function blocks up to 1ms and recovers (if there are no IO events). As we continue to enter the event loop loop, uv_update_time updates the time and Uv_process_timers discovers that our timer expires and executes the callback. So the initial analysis is that either uv_update_time out the problem (not updating the current time correctly), or the poll function waits for the 1ms to recover, and the 1ms wait is out of the question.

Looking at MSDN, we have surprisingly found a description of the GetTickCount function:

The resolution of the GetTickCount function is limited to the resolution of the system timer, which are typically in the RA Nge of milliseconds to milliseconds.

The precision of the gettickcount is so coarse! Suppose the poll function blocks the 1ms time correctly, but the next time the uv_update_time is executed, the current loop is not updated correctly! So our timer was not determined to expire, so poll waited for the 1ms, and entered the next event loop. Until finally GetTickCount was properly updated (the so-called 15.625ms update), the current time of the loop was updated, and our timer was only judged expired in uv_process_timers.

Turn to WebKit for help
The source of this node.js is very helpless: he used a low precision time function, and did not do any processing. But we immediately thought that since we use Node-webkit, we have Chromium settimeout in addition to the settimeout of Node.js. Write a test code, run with our browser or Node-webkit: http://marks.lrednight.com/test.html (#后面跟的数字表示需要测定的间隔), the results are as follows:

According to HTML5 specification, the theoretical result should be the first 5 times the result is 1ms, the result is 4ms. The results shown in the test cases start from the 3rd time, which means that the data on the table should theoretically be 1ms from the first 3 times, followed by 4ms. The result is a certain error, and according to the rule, the smallest theoretical result we can get is 4ms. Although we are not satisfied, but obviously this is more satisfying than the node.js result. Strong curiosity trends Let's look at Chromium's source code and see how he realizes it. (https://chromium.googlesource.com/chromium/src.git/+/38.0.2125.101/base/time/time_win.cc)

First, Chromium uses the timeGetTime () function to determine the current time of the loop. Review MSDN to find that the accuracy of this function is affected by the system's current timer interval. On our test machine, it is theoretically the 1.001ms mentioned above. However, Windows system by default, the timer interval is its maximum value (15.625MS on the tester) unless the application modifies the global timer interval.

If you are concerned about the news in the IT world, you must have seen a news like this. It looks like our Chromium set the timer to a very small interval! Looks like we don't have to worry about the system timer interval. Do not be happy too early, such a repair gives us blow. In fact, the problem has been fixed in Chrome 38. Do we have to use to fix the previous node-webkit? This is obviously not elegant enough and prevents us from using the Chromium version of the higher performance.

Further view Chromium source code we can find that when there is a timer, and the timer timeout < 32ms, Chromium changes the system's global timer interval to achieve a timer that is less than 15.625ms precision. (view source) when the timer is started, something called Highresolutiontimermanager is enabled, and the class calls the Enablehighresolutiontimer function based on the current device's power type. Specifically, when the current device is in a pool, he invokes Enablehighresolutiontimer (false), which passes true when the power is used. The implementation of the Enablehighresolutiontimer function is as follows:

Copy Code code as follows:



void Time::enablehighresolutiontimer (bool enable) {


Base::autolock Lock (G_high_res_lock. Get ());


if (g_high_res_timer_enabled = = enable)


Return


g_high_res_timer_enabled = enable;


if (!g_high_res_timer_count)


Return


Since g_high_res_timer_count!= 0, an Activatehighresolutiontimer (true)


is called which called Timebeginperiod with g_high_res_timer_enabled


With a value which is the opposite of |enable|. With that information we


Call Timeendperiod with the same value used in Timebeginperiod and


Therefore undo the period effect.


if (enable) {


Timeendperiod (kmintimerintervallowresms);


Timebeginperiod (kmintimerintervalhighresms);


} else {


Timeendperiod (kmintimerintervalhighresms);


Timebeginperiod (kmintimerintervallowresms);


}


}


of which, kmintimerintervallowresms = 4,kmintimerintervalhighresms = 1. Timebeginperiod and Timeendperiod are functions provided by Windows to modify the system timer interval. That is to say, when the power is plugged in, the smallest timer interval we can get is 1ms, and when we use the battery, it's 4ms. As our cycle continues to invoke the settimeout, the minimum interval is 4ms according to the rules of the consortium, so it's a relief that has little impact on us.

Another precision problem

Back to the beginning, we found that the test results showed that the interval of settimeout was not stable at 4ms, but was constantly fluctuating. The http://marks.lrednight.com/test.html test results also show that the interval between 48ms and 49ms beats. The reason is that in the event loop of Chromium and Node.js, the precision of the Windows function call that waits for IO events is affected by the current system's timer. The implementation of the game logic needs to use the Requestanimationframe function (constantly updating the canvas), this function can help us to set the timer interval at least kmintimerintervallowresms (because he needs a 16ms timer, Trigger the requirement of a high-precision timer). When the tester uses the power supply, the system timer interval is 1ms, so the test result has ±1ms error. If your computer has not been changed by the system timer interval, run the #48 test above, Max may reach 48+16=64ms.

Using the settimeout implementation of Chromium, we can control the error of settimeout (FN, 1) around 4ms, while the settimeout (FN, 48) error can be controlled around 1ms. So, we have a new blueprint in mind that makes our code look like this:

Copy Code code as follows:

/* Get the max interval deviation * *
var deviation = getmaxintervaldeviation (bucketsize); Bucketssize = +, deviation = 2;
function Gameloop () {
var now = Date.now ();
if (Previousbucket + bucketsize <= now) {
Previousbucket = Now;
Dologic ();
}
if (Previousbucket + bucketsize-date.now () > deviation) {
Wait 46ms. The actual delay is less than 48ms.
SetTimeout (Gameloop, bucketsize-deviation);
} else {
Busy waiting. Use Setimmediate instead of the Process.nexttick because the former does does not block IO events.
Setimmediate (Gameloop);
}
}

The above code lets us wait for an error less than bucket_size (bucket_size–deviation) time rather than directly equals a bucket_size,46ms delay even if the biggest error occurs, according to the theory above, The actual interval is also less than 48ms. The rest of the time we use the busy wait method to make sure that our gameloop is executed in sufficiently precise interval.

Although we use Chromium to some extent solve the problem, but this is obviously not elegant.

Do you remember our initial request? Our server-side code should be able to be detached from the Node-webkit client and run directly on a computer with a node.js environment. If you run the above code directly, the value of the deviation is at least 16ms, which means that in each 48ms we are busy waiting for 16ms of time. CPU usage rub on the rub on the go up.

Unexpected surprise.

What an irritating thing, node.js, that no one notices in such a big bug? The answer is a surprise to us. This bug has been fixed in the v.0.11.3 version. The master branch that directly views the LIBUV code can also see the modified results. The specific approach is to put the current time of the loop, plus a timeout, after the poll function waits to be completed. So even if GetTickCount does not respond, after the poll wait, we add the time to wait. So the timer will be able to expire smoothly.

In other words, the problem of half a day's hard work has been solved in v.0.11.3. However, our efforts are not in vain. Because even if the effect of the GetTickCount function is eliminated, the poll function itself is affected by the system timer. One of the solutions is to write the Node.js plug-in and change the system timer interval.

However, our game, the initial setting is not a server. After the client has established the room, it becomes a server. Server code can run in a node-webkit environment, so the problem with Windows system timer is not the highest priority. According to the solutions we have given above, the results are enough to satisfy us.

Closure

Solves the problem of the timer, our framework implementation is basically no longer a hindrance. We provide WebSocket support (in a pure HTML5 environment), and also customize the communication protocol to achieve a higher performance of Socket support (Node-webkit environment). Of course, Spaceroom's function was rudimentary at first, but as demand was raised and time increased, we were gradually perfecting the framework.

For example, we found that when we needed to generate a consistent random number in our game, we added such functionality to the Spaceroom. At the beginning of the game spaceroom will distribute random number of seeds, the client's Spaceroom provides the use of the randomness of MD5, with the help of random number of seeds to generate random number of methods.

It seems to be quite gratifying. A lot of things have been learned in the process of writing such a framework. If you have a little interest in spaceroom, you can also participate in it. Believe that Spaceroom will be in more places to exert its fists.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.