MMORPG Server Architecture I. Summary
1. Online game MMORPG Overall server framework, including early, mid-term, some current mainstream architectures
2. Network Game Network layer, including network protocol, IO model, network framework, message encoding and so on.
3. Network game scene Management, AI, script application and so on.
4. Open source Web server engine
5. Reference Books, Blogs
Two. Key words
Network protocol network IO message broadcast synchronization CS TCP/UDP IP cluster load Balancing distributed
Gateway Server gateserver heartbeat multithreading/thread pool open Source network communication framework/model
Blocking/non-blocking/synchronous/asynchronous Proactor/reactor/actor Select/poll/epoll/iocp/kqueue
Design patterns/data structures in game development
Short connection and long connection game security Cache Message Encoding Protocol scripting language
Socket nagle/Sticky bag/truncate/tcp_nodelay ai/scene split/Sub map open source MMORPG server
Three. Text frame structure
1. Early MMORPG server architecture
CLIENT<->GAMESERVER<->DB all business, data centralized processing
Advantages: simple, rapid development
Disadvantages:
1. All operations are put together and the system burden is greatly increased. A bug can cause the entire server to crash, causing all players to lose their line or even lose serious consequences.
2. Open the service flash, all players piled up in the same novice village .->>>> card, client card (too much screen number rendering/broadcast Storm) server card (processing a lot of same scene message/broadcast Storm) 2. Mid-user Isolated cluster
GameServe1
Client | Db
GameServer2
Players continue to increase the number of line-up programs automatically or players manually choose to enter
Cons: Operation to the late, with each line of players reduced, interaction greatly reduced.
3. Mid-Late data separation cluster by Map server, the current mainstream
Novice Village Problem: "Tianlong eight" put forward a better solution, the establishment of a number of parallel to the new village map, a master multi-pair, open as much as possible at the same time to accommodate the influx of users, high-level players from other maps back to novice village can only reach the main novice village.
4. The current mainstream network game architecture
Note: There is a TCP connection between Gateserver and Centerserver. The connection between Gameserver and Logserver can be a UDP connection. There is a general figure that needs refinement in many places.
Gateserver: Gateway server, Agentserver, ProxyServer
Advantages:
(1) As a transit station for network communication, is responsible for maintaining the intranet and external network isolation, so that the external can not directly access internal servers, to ensure the security of intranet servers, to a certain extent, less plug-in attacks.
(2) The Gateway server is responsible for parsing packets, adding and decrypting, time-out processing and certain logic processing, so that the error packets and illegal packets can be filtered out in advance.
(3) The client program only need to establish a connection with the gateway server to enter the game, no need to establish multiple connections with other game servers, save the client and server programs network resource overhead.
(4) When the player jumps the server, does not need to disconnect with the gateway server connection, the player data in the different game server switch is the intranet switch, the switch work instantaneous question completes, the player almost imperceptible, this guaranteed the game smoothness and the good user experience.
Disadvantages:
1. Gateway server becomes a communication bottleneck problem under high load condition
2 A problem with a single node failure in the gateway that prevents the entire set of servers from being serviced externally
Solution: Multi-gateway technology. As the name implies, "multi-gateway" is the existence of multiple gateway servers, such as a group of servers can be configured three Gamegme. When the load is large, you can increase your gateway's overall traffic by increasing the gateway server, and when one gateway server goes down, it only affects the clients connected to the server, and the other clients are unaffected.
Dcserver: Data Center server. The main function is to cache player role data to ensure that role data can be read and saved quickly.
Centerserver: Global server/hub server, also known as Worldserver. The main responsibility is to maintain the data transmission between Gameserver and data broadcast. Other game systems may also be placed on the center, such as Buddy System, guild system.
Improvement: Refine the gateway server to Logingateserver and multiple gamegateserver.
5. Split-by-business clustering
Because there are many businesses in the online game, such as chatting, fighting, walking, NPC and so on, some business can be divided into separate servers. This way, the program for each server is much leaner. And some high-traffic separation, can effectively improve the number of game server limit.
Advantages:
1. Separation of business makes each server's program simple, which reduces the chance of error. Even if something goes wrong, it does not affect the entire game, and the wrong server is replaced by a quick launch of another standby server.
2. The separation of the business allows the flow to be dispersed, and the corresponding speed back to be improved.
3. Most businesses are separated into separate servers, so they can be added dynamically to increase the maximum number of people.
Improved: Even the login server can be fine-grained and split to build roles, select role servers
6. A simple and practical Network game server architecture
Each box in the following illustration represents a separate process app component, and the overall service is not completely interrupted if a outage occurs for each service process that affects some users. After the outage process restarts, it can be integrated and all services continue.
Gls:game Login server, the game to log on to servers, some kind of program, it is not the core components, GLS call the external interface, the basic user name password authentication. There are also many ancillary functions that need to be implemented: Login queue (very helpful for folio), GM Super Login channel (GM can not queue into the game), activate user control during the test, restrict user login, control client version, etc.
DB: Essentially a large memory buffer for background SQL that isolates database operations, compares in-memory data, and writes only changed data to SQL in batches. The algorithm of the system, the development of stability are very high requirements.
Center: All components are registered here, the online player's session state is centrally stored here, and the components have a heartbeat connection. All external interfaces are also passed here.
Character entry: Select role after player login
Gs:game server, the most core components, the same map, all the game logic related functions, are done here.
Gate: Set up and user's regular link, mainly for sockt forwarding, shielding malicious packets, the GS protection. Protocol encryption and decryption function, a gate to share multiple GS, reduce the jump map connection not on the risk.
IM, relationship, consignment: represents other components, responsible for the corresponding cross-map occurrence of global game logic.
7. Another architecture diagram
1-This is a WebService pipeline, when the user activates the area account, or modifies the account password, through this channel to insert and update the user's account information.
2-This is also a WebService pipeline to get and control the user's role information within that group, as well as update operations such as paying store tokens.
3-This is a local TCP/IP connection, this connection is mainly used for the server group in the login server registration, as well as the login server authentication account, to the user server to register the account login information, as well as the login account role information to operate (such as kicking off the current login role), There are also information updates for the server group (number of players currently online, etc.).
4-This is also a local TCP/IP connection, which is used to authenticate the client connected to the Gameserver, to obtain the role data information, and to return the change of the data information of the role on the Gameserver.
5-This connection is also a local TCP/IP connection that is used for the interaction between the public Information Server and several game servers to Exchange world-class information (such as guild information, cross-party messages, cross-service chat channels, etc.).
6-Two connections here, to express the meaning is that the agent Userserver and Gameserver can be used interchangeably, that is, after the player enters the group, there is no need to switch agent. If not afraid of chaos, you can also log on the server Agent, so that users throughout the process will not need to replace the agent, reduce the number of repeated connections, but also improve the stability. (after all, the number of connections is less, but also reduce the likelihood of the occurrence of the server)
In this architecture, Gameserver is actually a complex of game logic that can then be expanded into several different logical servers, with Publicserver for public data exchange.
Userserver actually played the role of a ServerGroup leader who was responsible for registering and updating the server group's information (first name, current number) to Loginserver, and dispatching the agent Provides a minimum number of agents for players who have selected this group. At the same time, it also has a role Management Server function, sent to the client's current role list, the role of the creation, deletion, selection and other management operations, are carried out here. Furthermore, it is an authentication server for user information, which gameserver need to authenticate the client and obtain the role data information of the player's choice.
Games that use this architecture typically have the following performance.
1-The user must activate a large area in order to log in to their account in a large area.
2-When the user launches the client, it pops up a login and selects a large area.
3-When the user starts the real client, it starts with the account password.
4-After the account verification is complete, make the server selection in the zone.
After the server selection is complete, enter role management. At the same time, roles cannot be shared on different servers. Four. Text network communication
1. Network protocols
Determine the TCP/UDP protocol based on the game type real-time requirement/whether to allow packet loss
A. TCP: connection-oriented, reliable, guaranteed sequence, slow, delayed
Each time TCP sends a packet, it waits for the receiver to send a reply, so that TCP can confirm that the packet was sent to the receiver through the Internet. If TCP does not receive a response from the receiver for a period of time, he stops sending a new packet, and instead sends a packet that does not receive answer 2, and continues to receive the response from the receiving party. So this will cause the network data transmission delay, if the network situation is not good, the sender will wait for quite a long time
UDP: No connection, unreliable, no guarantee order, fast
B. Long Connections/short connections
Long connection, refers to a TCP connection can be continuously sent multiple packets, during the TCP connection is maintained, if there is no packet sent, need to send a detection packet to maintain this connection, usually need to do their own online dimension
Connect → data transfer → keep connected (heartbeat) → data transfer → keep connected (heartbeat) → ... → Close connection
A short connection is a TCP connection that is established when data is interacting between two communicating parties, and the TCP connection is disconnected after the data is sent, such as HTTP
Connection → data transfer → close connection
2. IO model
IO model in UNIX5
1. Blocking IO (Blocking I/O Model)
2. Non-blocking IO (nonblocking I/O Model)
3. IO multiplexing (I/O multiplexing Model)
4. Signal-driven IO (signal-driven I/O Model)
5. Asynchronous IO (asynchronous I/O Model)
Io is divided into two phases:
1. Notify the kernel to prepare the data. 2. Data is copied from the kernel buffer to the application buffer
According to this 2-point IO type can be divided into:
1. Block Io, which is blocked on both stages.
2. Non-blocking IO, in phase 1th, the program keeps polling until the data is ready, and the 2nd stage is blocked
3.IO multiplexing, in the 1th phase, when one or more Io is ready, the notifier, the 2nd stage is still blocked, in the 1th phase or poll implementation, but all IO is concentrated in one place, this place polls
4. Signal IO, when the data is ready, signal notification program data is ready, 2nd stage blocking
5. Asynchronous io,1,2 are not blocked
Blocking multiple I/O operations at the same time. I/O functions can be detected at the same time for multiple read operations and multiple write operations, and I/O operations function is not really called until there is data readable or writable.
J Ava#selector
Allow the socket interface for signal-driven I/O and install a signal handler function, the process continues to run and is not blocked. When the data is ready, the process receives a sigio signal that can be called by the I/O operation function in the signal processing function to process the data.
J Ava#nio2
When a system call is issued, it is returned directly. Notifies the IO operation to complete.
The first four kinds of synchronous Io, the last asynchronous Io. The difference: The second phase must require the process to invoke Recvfrom actively. Asynchronous IO sends all IO operations to the kernel to complete and signal notification. During this period, the user does not need to check the status of the IO operation, and does not need to actively copy the data.
3. Reasons for Thread blocking:
1.thread.sleep (), the thread discards the CPU, sleeps n seconds, and then resumes running
2. The thread executes a synchronization code, blocking because it cannot get the associated lock. After the sync lock is acquired, the operation can be resumed.
The thread executes the wait method of an object, enters the blocking state, and only waits until other threads have executed the object's notify, Nnotifyall, to wake it.
4.IO operation, waiting for related resources
The common feature of a blocking thread is that it discards the CPU, stops running, and only waits until the cause of the blocking is eliminated before it resumes running. or interrupted by another thread, the thread exits the blocking state and throws interruptedexception.
4. Blocking/non-blocking/synchronous/asynchronous
Synchronous/asynchronous is concerned with the mechanism of how messages are notified. The blocking and non-blocking concerns are processing messages. are two completely different sets of concepts.
5. Several common concepts
Select Poll
Epoll (Linux) kqueue (FreeBSD)
IOCP windows
Reactor
Dispatcher (Dispatcher), Notifer (notifier), when an event arrives, the handler is dispatched with the Dispatcher (dispatcher), which is the Dispatcher to maintain for all registered handler. At the same time, there is a demultiplexer (sorter) to sort multiple synchronous events.
Proactor
Both Proactor and reactor are design patterns in concurrent programming. Used to distribute/detach IO operation events. The so-called IO event is an IO operation such as Read/write. "Distribute/detach" is the notification of a separate IO event to the upper module. The two modes differ in that Proactor is used for asynchronous IO, while reactor is used to synchronize IO.
The same point in two modes is the event notification for an IO event (that is, telling a module that the IO operation can or has been completed). Structurally, the two also have the same point: Demultiplexor is responsible for committing the IO operation (async), querying whether the device is operational (synchronous), and then callback handler when the condition is met.
The difference is that the asynchronous case (Proactor), when the callback handler, indicates that the IO operation has completed, in the case of synchronization (Reactor), callback handler, indicates that the IO device can do some operation (can read or can write), Handler start the commit operation at this time.
6. Network Communication framework
TCP Server framework:
Apache MINA (Multipurpose Infrastructure for Network applications) 2.0.4
Netty 3.5.0Final
Grizzly 2.2
Quickserver is a free open source Java library for rapid creation of robust multithreaded, multi-client TCP server applications. With Quickserver, users can focus only on the application's Logic/protocol
Cindy robust, scalable, and efficient asynchronous I/O framework
Xsocket a lightweight NIO-based server framework for developing high-performance, scalable, multi-threaded servers. The framework encapsulates threading, asynchronous read/write, etc.
ACE 6.1.0 c++adaptive Communicationenvironment,
Smaxfoxserver 2.X Cross-platform socket server specifically designed for Adobe Flash
7. Message Encoding Protocol
amf/json/xml/Custom/protocolbuffer
No matter what kind of network application, one of the problems that must be solved is that the application layer splits the message from the byte stream, that is, for TCP, the receiver application layer can recognize the message transmitted by the sender from the byte stream.
1. When the application layer parses the received byte stream using a special character or string as the boundary of the message, the character or string is considered to receive a complete message when it is encountered
2. Define a length for each message, and the application layer receives a byte stream of the specified length that it has received a complete message
Message delimited identifier (separator), message header, message body (body)
Len | message_id | Data
|separator | header | Body |
| Len | message_id | Data
8. Sticky Bag:
A TCP sticky packet is a packet of packets sent by the sender to the receiver when it is received, viewed from the receive buffer, followed by the head of the packet data immediately preceding the end of the packet.
1. The sticky packets caused by the sender are caused by the TCP protocol itself, and TCP is often needed to collect enough data to send a packet of data to improve transmission efficiency. If data are sent several times in a row, TCP will typically send the data to a packet after the optimization algorithm, so that the receiver receives the sticky packet data.
2. The sticky packet caused by the receiver is due to the fact that the receiver user process does not receive the data in time, resulting in sticky packets. This is because the receiver first put the received data in the system receive buffer, the user process from the buffer to fetch data, if the next packet of data arrives before a packet of data has not been taken away by the user process, the next packet of data into the system receive buffer when the previous packet of data is received, The user process takes data from the system receive buffer based on the pre-set buffer size, so that it takes more than one packet of data at a time
Resolution:
1. For the sender caused by the sticky packet phenomenon, the user can be programmed to avoid, TCP provides the force of data transfer immediately after the operation instruction Push,tcp software received the operation instruction, the data immediately sent out, without waiting for the transmission buffer full;
tcp-no-delay-closed the optimization algorithm, not recommended
2. For the receiver caused by the adhesive package, you can optimize the program design, reduce the workload of the receiving process, improve the priority of receiving process and other measures to receive data in a timely manner, so as to avoid the sticky phenomenon-when the high frequency of transmission may still appear sticky bag
3. Receiver control, a packet of data by the structure of the field, human control by multiple reception, and then merged, through this means to avoid sticky packets. -Low efficiency
4. The receiver creates a preprocessing thread that is preprocessed with the received packets, separating the stuck packets
The idea of subcontracting algorithm:
The basic idea is to first convert the received data (length set to m) into a predetermined structure data form, and remove the data structure length field, i.e. N, and then calculate the first packet data length according to n
1) If n<m, it indicates that the data stream contains multiple packets of data, from its head intercept n bytes into the temporary buffer, the remainder of the data continue to cycle processing, until the end.
2) If n=m, it indicates that the data stream content is exactly a complete structure data, directly into the temporary buffer.
3) If the n>m, it indicates that the data stream content is not enough to form a complete structure of data, to be left with the next packet of data to be processed.
Five. Scene management of the text, AI, script
Aoi: (area of Interest), in broad sense, the AOI system supports any object in the game world to deal with events occurring within a certain radius, but most of the requirements on MMOPRG are handled only by the departure/entry events of objects that occur within the radius. When you enter a game scene, if you can see other players, the back Aoi system is working.
1. It is easy to imagine that the most simple requirement for AOI is that all players in the world are synced to the client. This scheme is the complexity of O (n^2), which is not