PHP-Socket-blocking and non-blocking, understanding of the concept of synchronization and Asynchronization-PHP source code

Source: Internet
Author: User
Tags epoll
This article will introduce you to an article on the concepts of PHP-Socket-blocking and non-blocking, synchronization and Asynchronization. I hope this article will help you. This article will introduce you to an article on the concepts of PHP-Socket-blocking and non-blocking, synchronization and Asynchronization. I hope this article will help you.

Script ec (2); script

1. Concepts

During network programming, we often see synchronization, Async, Block, and Unblock calling methods:
Synchronization:
The so-called synchronization means that when a function call is sent, the call will not return until the result is not obtained.That is, you must do one thing.Before you can do the next thing.

For example, in normal B/S mode (synchronous): submit a request-> wait for server processing-> return after processingDuring this period, the client browser cannot do anything.

Asynchronous:
Asynchronous concept and synchronization relative. When an asynchronous process is called, the caller cannot obtain the result immediately. After the call is completed, the caller is notified by status, notification, and callback.

For example, ajax request (asynchronous ):Request triggered by event-> server processing (this is what the browser can do)-> processing completed

Blocking
A blocked call means that the current thread is suspended before the call result is returned (the thread enters the non-executable state, in which the cpu does not allocate a time slice to the thread, that is, the thread stops running ). The function is returned only after the result is obtained.

Some people may equate blocking calls with synchronous calls. In fact, they are different. For synchronous calls, the current thread is still activated in many cases, but the current function does not return logically.For example, we call the recv function in the socket. If there is no data in the buffer zone, this function will wait until data is returned. At this time, the current thread will continue to process a variety of messages.

Non-blocking
The concept of non-blocking corresponds to blocking, which means that the function will not block the current thread and return immediately before the result cannot be obtained immediately.
Object blocking mode and function calling
Whether the object is in blocking mode is highly correlated with whether the function is blocked or not, but it is not one-to-one. Blocking objects can have non-blocking calling methods. We can use certain APIs to poll objects.To avoid blocking. For non-blocking objects, calling special functions can also be blocked. The select function is an example.

1. Synchronization means that I call a function and wait for the result before the function is completed.
2. Asynchronization means that I call a function and do not need to know the result of this function. If this function has a result, I will be notified (callback notification)
3. Blocking, That is, call me (function). I will not return it until I (function) does not receive the data or receive the result.
4. Non-blocking,Is to call me (function), I (function) returns immediately, through the select notification caller

The difference between synchronous IO and asynchronous IO is whether the process is blocked during data copying!

The difference between blocking IO and non-blocking IO is whether the application calls are returned immediately!

For a simple c/s Mode:

Synchronization: submit a request, wait for the server to process, and return the result. The client browser cannot do anything during this period.
Asynchronous: requests are triggered through events-> server processing (this is what the browser can do)-> processing is complete

Both synchronous and asynchronous are only applicable to local sockets.


Synchronous and asynchronous, blocking, and non-blocking are mixed. In fact, they are not the same thing, and their modified objects are also different.
Blocking and non-blocking means whether the process needs to wait when the data accessed by the process is not ready. In short, this is equivalent to the implementation difference within the function, that is, whether to return directly or wait for the ready state when it is not ready;


Synchronization and Asynchronization refer to the data access mechanism. Synchronization generally refers to the method of actively requesting and waiting for the completion of I/O operations, when the data is ready, it must be blocked when reading and writing data (the difference between readiness and read/write is two stages, synchronous read/write must be blocked). asynchronous means that other tasks can be processed after the data is actively requested, then wait for the I/O to complete the operation notification, which can enable the process to read and write data without blocking. (Wait for "notification ")

1. Five I/O models in Linux


1) block I/O (blocking I/O)
2) non-blocking I/O (nonblocking I/O)
3) I/O multiplexing (select and poll) (I/O multiplexing)
4) signal-driven I/O (signal driven I/O (SIGIO ))
5) asynchronous I/O (the POSIX aio_functions ))



The first four are synchronous, and the last one is asynchronous IO.


Blocking I/O model:


Introduction: the process will be blocked until the data copy is complete.


An application calls an I/O function, leading to application blocking and waiting for data preparation. If the data is not ready, keep waiting .... When the data is ready, copy the data from the kernel to the user space, and the IO function returns a successful indication.


Blocking I/O model diagram:When calling the recv ()/recvfrom () function, the process of waiting for data and copying data occurs in the kernel.



When calling the recv () function, the system first checks whether there is prepared data. If the data is not ready, the system is waiting. When the data is ready, copy the data from the system buffer to the user space, and then return the function. In a nested application, when the recv () function is called, data may not exist in the user space, and the recv () function will be in the waiting state.



When you use the socket () function and WSASocket () function to create a socket, the default socket is blocked. This means that when the call to the Windows Sockets API cannot be completed immediately, the thread is waiting until the operation is completed.


Not all Windows Sockets API calls with the socket block parameter will be blocked. For example, when the bind () and listen () functions are called using the socket in blocking mode as parameters, the function will return immediately. Call Windows Sockets APIs that may block Sockets into the following four types:


1. Enter the operation:Recv (), recvfrom (), WSARecv (), and WSARecvfrom () functions. Call this function to receive data with the block socket as the parameter. If no data is readable in the socket buffer, the calling thread remains asleep until the data arrives.


2. Output operation:Send (), sendto (), WSASend (), and WSASendto () functions. Call this function to send data with the block socket as the parameter. If the socket buffer has no available space, the thread will sleep until there is space.


3. accept connections: The accept () and WSAAcept () functions. Call this function with the socket block parameter and wait for receiving the connection request from the other party. If there is no connection request at this time, the thread will enter sleep state.


4. Outbound Connections: connect () and WSAConnect () functions. For TCP connections, the client calls this function to initiate a connection to the server by taking blocking Socket as the parameter. This function does not return a response before receiving a response from the server. This means that the TCP connection always waits for at least one round-trip time to the server.


Using a socket in blocking mode makes it easy to develop network programs. When you want to be able to send and receive data immediately and process a small number of sockets, it is appropriate to use the blocking mode to develop network programs.


Insufficient Socket in blocking mode is manifested in difficulties in communication between a large number of established socket threads. When a network program is developed using the "producer-consumer" model, each socket is allocated a read thread, a processing data line, and a synchronization event, this will undoubtedly increase system overhead. Its biggest drawback is that it will not be able to handle a large number of sockets at the same time, and its scalability is poor.


Non-blocking IO Model

Brief Introduction: non-blocking I/O functions are repeatedly called by the process (multiple system calls and immediate return). During data copying, the process is blocked;

When we set a SOCKET interface to non-blocking, we tell the kernel that when the requested I/O operation cannot be completed, do not sleep the process, but return an error. In this way, our I/O operation functions will continuously test whether the data is ready. If not, continue the test until the data is ready. In this continuous testing process, it will take a lot of CPU time.

Set the SOCKET to non-blocking mode, that is, to notify the system kernel: when calling the Windows Sockets API, do not let the thread sleep, but let the function return immediately. The function returns an error code. As shown in the figure, a non-blocking mode socket calls the recv () function multiple times. Kernel data is not ready when the recv () function is called three times before. Therefore, this function immediately returns the WSAEWOULDBLOCK error code. When the recv () function is called for the fourth time, the data is prepared and copied to the buffer zone of the application. The recv () function returns a successful instruction and the application starts to process the data.

When you use the socket () function and WSASocket () function to create a socket, it is blocked by default. After creating a socket, call the ioctlsocket () function to set the socket to non-blocking mode. In Linux, the function is fcntl ().
After the socket is set to non-blocking mode, when the Windows Sockets API function is called, The called function will return immediately. In most cases, these function calls call "fail" and return the WSAEWOULDBLOCK error code. It indicates that the requested operation has no time to complete during the call period. Generally, the application needs to call the function repeatedly until the code is returned successfully.

It must be noted that not all Windows Sockets APIs are called in non-blocking mode and WSAEWOULDBLOCK errors are returned. For example, this error code is not returned when the bind () function is called using a socket in non-blocking mode as a parameter. Of course, this error code will not be returned when you call the WSAStartup () function, because this function is the first function called by the application, and of course this error code will not be returned.

To set the socket to non-blocking mode, in addition to using the ioctlsocket () function, you can also use the WSAAsyncselect () and WSAEventselect () functions. When this function is called, the socket is automatically set to a non-blocking mode.

The WSAEWOULDBLOCK error is often returned when a function is called using a non-blocking socket. Therefore, at any time, you should carefully check the Returned Code and be prepared for "failure. The application continuously calls this function until it returns a successful response. In the above program list, the While LOOP continuously calls the recv () function to read 1024 bytes of data. This is a waste of system resources.

To complete this operation, someone uses the MSG_PEEK flag to call the recv () function to check whether data in the buffer zone is readable. Similarly, this method is not good. This method causes a high overhead on the system, and the application must call the recv () function at least twice to actually read data. A good practice is to use the "I/O model" of the socket to determine whether the non-blocking socket is readable and writable.

The non-blocking mode socket is not easy to use compared with the blocking mode socket. To use a non-blocking socket, you need to write more code to handle the WSAEWOULDBLOCK error received in each Windows Sockets API function call. Therefore, non-blocking sockets are difficult to use.

However, non-blocking sockets control the establishment of multiple connections, uneven data sending and receiving volume, time is not scheduled, obviously has an advantage. This type of socket is difficult to use, but as long as these difficulties are ruled out, it is still very powerful in functionality. Generally, you can consider using the socket "I/O model", which helps applications manage the communication between one or more sockets in an asynchronous manner.

IO Reuse Model:

Introduction: select and epoll are the main types. For an I/O port, two calls and two responses are not superior to blocking I/O; the key is to monitor multiple I/O ports at the same time;

The I/O reuse model uses select, poll, and epoll functions. These functions can also block processes, but they are different from those that block I/O, these two functions can block Multiple I/O operations at the same time. In addition, I/O functions of multiple read operations and write operations can be detected at the same time. I/O operation functions can be called only when data is readable or writable.

Signal-driven I/O

Introduction: two calls and two responses;

First, we allow the set of interfaces for signal-driven I/O, and install a signal processing function, the process continues to run without blocking. When the data is ready, the process receives a SIGIO signal and can call the I/O operation function in the signal processing function to process the data.

Asynchronous IO Model

Introduction: The process does not need to be blocked during data copying.

When an asynchronous process is called, the caller cannot obtain the result immediately. After the call is completed, the caller's input and output operations are notified through status, notification, and callback.


Synchronization IO causes process blocking until IO operations are completed.
Asynchronous IO does not cause process blocking.
IO reuse is first blocked by the select call.

Comparison of five I/O models:


1. Introduction to select, poll, and epoll

Both epoll and select can provide multiple I/O multiplexing solutions. In the current Linux kernel, epoll is unique to Linux, and select should be defined by POSIX.

Select:

In essence, select performs the next step by setting or checking the data structure that stores the fd flag. The disadvantage is:

1. The number of fd instances that can be monitored by a single process is limited, that is, the number of ports that can be monitored is limited.

Generally, this number has a lot to do with the system memory. You can check the specific number in cat/proc/sys/fs/file-max. The default number of 32-bit hosts is 1024. The default value of the 64-bit host is 2048.

2. linear scanning is used for socket scanning, that is, polling is adopted, with low efficiency:

When there are many sockets, each select () request is scheduled by traversing FD_SETSIZE sockets. No matter which Socket is active, it is traversed. This will waste a lot of CPU time. If you can register a callback function for a socket and automatically complete related operations when they are active, the polling is avoided. This is exactly what epoll and kqueue do.

3. Maintain a Data Structure to store a large amount of fd data, which will cause high replication overhead when the user space and kernel space transmit the structure.

Poll:

Poll is essentially no different from select. It copies the input array to the kernel space and queries the device status of each fd, if the device is ready, add an item to the device waiting queue and continue the calendar. If no ready device is found after traversing all fd, the current process will be suspended until the device is ready or timed out, after being awakened, it will traverse fd again. This process has undergone many unnecessary traversal.

It has no limit on the maximum number of connections because it is stored based on a linked list, but it also has a disadvantage:

1. A large number of fd arrays are completely copied between the user State and the kernel address space, regardless of whether such replication is intentional or not. 2. Another feature of poll is "horizontal triggering". If fd is not processed after reporting, the next poll will report the fd again.

Epoll:

Epoll supports horizontal triggering and edge triggering. The biggest feature of epoll is edge triggering. It only tells the process which fd has just changed to the desired state and will only notify once. Another feature is that epoll registers the fd through epoll_ctl using the event-based notification method. Once the fd is ready, the kernel uses a callback mechanism similar to callback to activate the fd, epoll_wait can receive notifications

Advantages of epoll:

1. There is no limit on the maximum number of concurrent connections,The maximum number of FD ports that can be opened is greater than 1024 (0.1 million ports can be monitored in 1 GB of memory );
2. Improved efficiencyIt is not a round-robin method and will not decrease as the number of FD increases. Only the active and available FD calls the callback function;
That is to say, the biggest advantage of Epoll is that it only manages your "active" connections, but it has nothing to do with the total number of connections. Therefore, in the actual network environment, Epoll is much more efficient than select and poll.


3,Memory copyThe mmap () file ing memory is used to accelerate message transmission with the kernel space. That is, epoll uses mmap to reduce the replication overhead.Summary of differences between select, poll, and epoll:

1. Support the maximum number of connections that a process can open












Select


The maximum number of connections that a process can open is defined by the FD_SETSIZE macro, which is the size of 32 integers (on 32-bit machines, the size is 32*32, similarly, the FD_SETSIZE on 64-bit machines is 32*64). Of course, we can modify and re-compile the kernel, but the performance may be affected. This requires further testing.


Poll


Poll is essentially no different from select, but it has no limit on the maximum number of connections because it is stored Based on linked lists.


Epoll


Although the maximum number of connections is exceeded, connections of about 0.1 million can be opened on machines with 1 GB of memory, and connections of about 0.2 million can be opened on machines with 2 GB of memory.

2. IO efficiency problems arising from the dramatic increase in FD












Select


Because the connection is linearly traversed during each call, increasing FD will lead to a "linear performance degradation problem" with slow traversal speed ".


Poll


Same as above


Epoll


The epoll kernel is implemented based on the callback function on each fd. Only the active socket can actively call callback. Therefore, when the number of active sockets is small, epoll is used without the linear decline of the previous two. However, when all sockets are very active, performance problems may occur.

3. message transmission mode












Select


The kernel needs to copy messages to the user space.


Poll


Same as above


Epoll


Epoll is implemented by sharing a piece of memory with the user space through the kernel.

Summary:

To sum up, select, poll, and epoll should be selected based on the specific use cases and the characteristics of these three methods.

1. On the surface, epoll has the best performance, but when the number of connections is small and the connections are very active, select and poll may have better performance than epoll, after all, epoll's notification mechanism requires a lot of function callbacks.

2. select is inefficient because it requires polling every time. However, inefficiency is relative, depending on the situation, and can also be improved through good design.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.