Build high-performance TCP/UDP server with PHP

Last Update:2016-06-23 Source: Internet

Author: User

Tags epoll unix domain socket least privilege

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

If the Web server is directly connected to DB, the hacker can find the DB username and password in the code after the Web server is compromised, potentially causing the risk of being dragged. And for DB, the number of connections is capped, and when more than one CGI needs to connect to the DB, it is possible to deny service because the number of DB connections is up to the limit. Therefore, it becomes necessary to add an intermediate layer between webserver and DB, and the middle layer and DB are kept long connected. When there are data requests, Web server and middle-tier server interact with private protocols (non-SQL) to improve security and performance. This is the embryonic form of the middle tier server.

With the continuous diversification of the web business, the role of middle-tier server has been far more than forwarding the DB data so simple, it has provided complete TCP and UDP services. Here's a look at the architecture.

1. TCP Server

Like most server schemas, the entire TCP server consists of the master process, the listener process, and the worker process. The master process monitors the health of the signal and the listener/worker process and pulls it back up when the process terminates unexpectedly. The listener process is responsible for holding the client connection while the worker process is doing the real business logic. Since listener is simply responsible for routing and subcontracting and does not involve any blocking calls, it is always not blocked. Listener and workers choose the UNIX domain socket as the communication mechanism, because the communication is limited to the relationship between the process, so we chose the nameless UNIX domain socket of an implementation--socketpair to do this.

In general, the number of listener is smaller than the number of workers, in order to facilitate drawing and description, in the following example, we assume that the number of listener is 5 2,worker.

1.1 Master Process

When the service starts, the current process (master process) is the first

Create a Socketpair of the number of workers (5), placed in a static variable (used to contract the worker)

Create a listener number (2) of Socketpair, placed in a static variable (used to collect packets from a worker)

Create a network Socket,bind,listen for communication between users

The next step is fork (), and some details need to be addressed.

Change the identity of a child process

Turn on CPU affinity options for child processes

Since all sockets are placed inside a static variable, that is, the data segment of the master process, the sockets are still accessible in the child process after fork (). It is also because these sockets are 22 paired between multiple processes, enabling listener and worker to communicate.

The UNIX domain socket is used for communication between processes running on the same machine, although it is encapsulated with the INET domain socket as the same interface, but the internal implementation is completely different. It simply replicates the data, does not perform protocol processing, does not need to add and remove network headers, calculates checks and generates sequential numbers, sends acknowledgement messages, and so on, so it is more efficient. UNIX domain socket provides TCP and UDP two kinds of interfaces, which one should we choose? Of course, UDP, as a non-connected protocol, do not need to maintain the connection state, so that can be pure asynchronous. But the question is, will the UDP protocol lead to packet loss? Is it not guaranteed in order? The answer is no, for obvious reasons, UNIX domain sockets are pipeline-based, and therefore reliable, without losing messages or passing errors.

Now that the process structure has become

The master process then completes the initialization work, which then enters the main loop of the listening signal and processing signal. Its main role is to monitor the health status of all child processes, make corresponding processing, and receive system signals to facilitate the management of reload,restart,stop and escalation of the status of the requirements of the Administrator, while master can dynamically configure the number of child processes (TODO).

If the status of the child process is found to change (SIGCHLD), then pull it back up, if it is the system's exit signal, then set the flag bit, waiting for other signals to be processed and then smooth exit.

1.2 Listener and worker processes

Since all listener and workers are child processes of the master process and have the 1+5+2 sock created by master, the first thing to do when the Lisener and worker processes start is to tell the kernel what you need to be concerned about sock, is placed in the Epoll.

We will find that this picture is not the same as the one just fock out in front of us, feeling a little bit less. The reason is that the sock in this picture is sock in Epoll, and the sock of concern for each listener and worker is not the same. For example, worker n only needs to put Socketpair_n into Epoll (the pair in the same color) to receive the data sent by listener.

After putting all the sock in Epoll, we only need to call epoll_wait in the main loop to get the events we need to handle, thus achieving pure asynchrony.

At this point all lisener processes are listening on the unified port, and when a user initiates a connection request, only one Lisener can accept success.

Once the acceptance is successful, the new sock (red squares in) will need to be placed in the Epoll to receive the user's data.

When the user data arrives, listener selects a Socketpair by round rolling method, subcontracting to a specific worker.

Epoll provides two mechanisms for event triggering, one is ET (edge triggering), and the other is LT (high-level trigger). The difference between the two is that for the ET pattern, the kernel notifies us of the socket-readable event when the data is first present in the buffer, and if the data is not read at this time, subsequent cores will no longer be notified. In the LT mode, a socket-readable event is triggered whenever there is data in the buffer. At the kernel level, the ET pattern is more efficient because the system only needs to be notified, Nginx is using the Epoll mode, so the Nginx worker needs to read the data in the buffer once every time a readable event is triggered. But in the implementation, because our Epoll uses the LT mode, on the one hand because the encoding is convenient, there is also an important reason is that the libevent PHP extension only supports the LT mode. So every time the kernel notifies us that sock readable events occur, listener reads 8k of data and then forwards them.

Since TCP packets are borderless and we use UDP to subcontract them, there are a few issues involved:

A package sent by the same client is transferred to the same worker to be processed so that the worker can spell out the correct and meaningful request

When a worker receives a package, it can classify it according to a different client, since it is possible for multiple client packages to be sent to the same worker

Package processing good after the return package to find the right listener

Listener will be able to locate the correct packet socket (red block), that is, the correct connection

Our solution is to add a header to each package based on the original data, the content of the packet is a user's label (specifically implemented in the Ip+port tag), the picture is marked with a color (yellow, violet, orange). At the same time listener need to maintain 2 connection pools, the first is to use the label to locate the Client-listener between the sock, the second is to use the label to locate the sock between Listener-worker. At the same time, listener in the contract to the worker also need to indicate their identity, convenient for the worker to choose the correct sock back to the package, so listener to the worker, the header includes the user's Ip,port and listener unique ID (number).

Such a simple asynchronous TCP server is built. When it comes to business, it's easy to implement the worker's process approach.

2. UDP Server

Because UDP is not connected, and each individual package is meaningful, it is easy to design. We use Msgqueue as listener and worker communication medium. This msgqueue is named, and all listener and workers can use it according to the unique key of Msgqueue.

Similarly, create a network socket that accepts packets sent by the client, and gets a message queue descriptor for communication between the listener and the worker. After a listener and a number of workers are forked, the master process enters the main loop of the listening message.

When Lisener is started, the user-connected sock is placed in Epoll. When a user requests a request, the kernel notifies them of a change in state. Unlike TCP server, this time Epoll only listener sock. Because the IPC message queue is pure memory maintenance, there is no corresponding mapping in the common file system, so Epoll is not supported, so the reading of MSG_QUEUE data is actively read by the worker at leisure.

Each worker enters the loop of processing data, read msg_queue.

After the worker has processed the data, it reuses the master's sock and returns it directly to the client based on the tag (ip,port) of each packet.

So when the system is congested, the first thing that overflows is the message queue.

3. Some details

If you just look at the structure, you can get here, and there are some implementation details.

3.1 Changing the identity of the process

The master process involves many privileged system calls, so it is run as root. We know that after fork (), workers and listener inherit the identity of the parent process, that is, the root permission, which is obviously not in accordance with the principle of least privilege (least privilege) (that is, our program should only have the minimum privileges required to complete a given task. This reduces the likelihood of security being compromised). Therefore, you need to change the identity of the child process after fork.

When looking at the API of the system we will find that there are setuid () and Seteuid () two methods, which should be used in the end?

We know that when a process tries to access a file, the kernel will determine whether it can do so based on the identity of the process and the file's permission bit. For a process, the kernel maintains three identities for it, respectively:

Real identity: Real UID, Real GID

Valid identity: Effective UID, effective GID

Storage identity: Saved UID, saved GID

Where the authorization is used to verify the valid user identity. So intuitively, the master process needs to call SETEUID () to change the child process's valid identity. This does not solve the problem, because the kernel maintains three sets of identities for a process because the process may need to use other users ' permissions while it is running, so multiple sets of identities are set up to help the process temporarily power up at runtime.

While the process is running, the process can choose to copy the real or stored identities to a valid identity to have the true identity or the right to store identities. Therefore, setting the listener and worker's valid identities only makes it possible to gain root privileges.

So the answer is more obvious, where the three identities of the subprocess need to be changed to nobody.

Let's take a look at setuid ()

If the process has root privileges, the Setuid function sets the real UID, effective UID, and saved UID to UID.

If the process does not have root privileges, but the UID equals the real uid or the saved UID, then setuid only sets the effective UID to UID and does not change the other two

If none of the above two conditions are satisfied, the error is returned directly.

Having said that, we found that it was the first case, so in the master process, all the sub-processes were setuid () and Setgid () operations.

We see that everything is in line with expectations. So the problem is, PS aux display of the first column user exactly what is the status of the process?

3.2 The communication mechanism between master and Listener/worker

As can be seen above, communication between listener and workers in TCP server is implemented through a UNIX socket, while UDP server is done by Message Queuing. However, there is always no mention of the communication mechanism between master and its child processes (listener & worker).

First look at the need for communication

(1) When an administrator needs to Stop,reload,restart, the child process needs to be notified by master

(2) When listener and worker status changes (for example, unexpected exit), the master process needs to be notified

For the 1th, master only needs a small package to be able to notify the child process of these operations, the package can be small to include only an integer, so we naturally think of the signal. The USER1 and USER2 signals provided to us by the kernel can solve this problem. Here to make a comparison, because nginx in the middle involves more state, so it uses Socketpair to complete the 1th demand.

For the 2nd, because both listener and worker are master's child processes, the kernel has done this for us, and the kernel sends the SIGCHLD signal to the parent process when the child process state changes. So there are two ways to solve this: one is to register SIGCHLD's handler function, and the other is to wait or waitpid in the master process to capture the event.

3.3 CPU Affinity

On multiprocessor machines, the general kernel-to-CPU scheduling is to enable CPU1 when the cpu0 load is about to reach the upper limit, so the order is executed. The memory duplication between processes is causing a waste of resources. So for an efficient service, we want all worker,listener processes to execute concurrent, not all processes running on the same CPU under the kernel's dispatch.

Fortunately, Linux exposes the kernel to the CPU's dispatch portion through Sched_setaffinity, allowing a process to be bound to a set of CPUs.

So in the implementation, we bind the listener and the worker with index to the number of CPUs to get the redundancy on a specific CPU to improve performance.

4. Issues that need to be improved

Solar server has been growing in the process, we have been learning the excellent architecture, in the process of solar service, we found that in fact, there are many aspects of the need to develop and improve.

4.1 Load Balancing

We see that the method of selecting workers in listener is achieved through a simple RR algorithm, starting with the number No. 0 worker.

That is, the request packet is sent evenly across the worker, and if the worker processes all requests in the same time, then the algorithm is not a problem. However, the length of the worker processing request is not controllable, so the structure is likely to cause the load between the workers to be completely unbalanced.

Because the worker's process approach is implemented by the business side, a communication mechanism between the listener and the worker is required to wrap the worker's busy level back to listener so that listener can select the worker with the lower current load.

Let's take a look at how Nginx solves the problem of load balancing. Nginx structure is relatively simple, only the master and worker processes, master process and solar's master function is similar, are only responsible for receiving signals and responsible for the health of the worker process. All connections and work are handle by the worker process.

In fact, Nginx solves the load balancing method is very rude, is to see the current worker's number of connections (statistics connection pool The number of used connections), whether the maximum number of connections (configuration) of more than 7/8, if greater than this threshold, you can not receive new connections.

4.2 Surprise Group problem

What is a surprise group problem? is when you throw a piece of bread in the square, all the pigeons will come to rob, but in the end only one can rob success, for the pigeons have not robbed the success of the wasted energy.

For a Linux server, it is multiple processes listening (listen) a port, when a new connection request is sent to this port, the kernel will notify all open this port process, but eventually only one process can accept success, which caused the waste of system resources.

The question is, how do multiple processes listen to the same port? Can't bind fail?

First fork, then bind.

First Bind,listen, then fork.

If two separate processes attempt to bind to the same port, the sockets in the two processes are two separate files in the file system, and if they try to bind to the same NIC, they will inevitably clash. Therefore, BIND will return an error directly. But if it is bind first, listen again fork, for each process of the sock in the file system only one mirror, so there will be no conflict, but the previous mention of the surprise group problem. (Thanks to Gexiaobaohelloworld's picture)

A more classical solution to the problem of surprise group is the lock, only the process to get the lock can go to listen, so that only one process in listen, and only this process can accept.

Nginx is the use of locks to ensure that only one process in the listen socket. Nginx uses the spin lock mechanism, because each worker is bound to a CPU, in order to better exploit the performance of the system, should be as far as possible to make each worker process is not in a blocking state. Nginx implements the respective spin lock according to different CPU architecture, ensuring that the worker is in the ready state without entering the blocking state when the lock is not available, thus reducing unnecessary context switching. However, a basic specification for using a spin lock is that the process takes a lock for a short time, otherwise the process of the wait lock consumes a lot of system resources. So how to release the lock as soon as possible becomes a problem. The solution of Nginx is that when the process acquires the lock successfully, the next step is not to dispose of all the events in the Epoll, but to divide the event into two kinds, one is the new connection event, and the other is a normal event, which is maintained in memory with two linked lists respectively. In a single loop, the worker first handles the new connection event, releases the lock, and then handles the normal event. This mechanism ensures that the locks are released in a timely manner.

5. More

Currently, mid-tier server is still evolving, and many feature are being developed, such as support for timed events, further asynchronous interactions between worker and backend server,db, and timeout protection, among others.



This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More