: Socket blocking and non-blocking, synchronous and asynchronous, I/O model

Source: Internet
Author: User
Tags posix socket blocking

 

    1. Concept understanding
    2. Five I/O models in Linux
      1. Blocking Io Model
      2. Non-blocking Io Model
      3. Io Reuse Model
      4. Signal-driven I/O
      5. Asynchronous Io Model
      6. Comparison of five I/O models
    3. Introduction to selectpollepoll

Socket blocking and non-blocking, synchronous and asynchronous

1. Concepts

 

During network programming, we often seeSynchronous/asynchronous (async), blocking/unblocking)Four call methods:
Synchronization:
The so-called synchronization means that when a function call is sent, the call will not return until the result is not obtained.That is, you must do one thing.,You can do the next task only after the previous task is finished.

 

For exampleB/SMode (synchronous): Submit the request->Waiting for server processing->Return after processing During this period, the client browser cannot do anything.

Asynchronous:
Asynchronous concept and synchronization relative. When an asynchronous process is called, the caller cannot obtain the result immediately. After the call is completed, the caller is notified by status, notification, and callback.

For example, an Ajax request (Asynchronous):Request triggered by event->Server processing (this is what the browser can do)->Processed

Blocking
A blocked call means that the current thread is suspended before the call result is returned (the thread enters the non-executable state, in which the CPU does not allocate a time slice to the thread, that is, the thread stops running ). The function is returned only after the result is obtained.

Some people may equate blocking calls with synchronous calls. In fact, they are different. ForFor step-by-step invocation, the current thread is still activated, but the current function does not return logically.For exampleOcketCall RECVFunction, if there is no number in the bufferData, this function will wait until data is returned. At this time, the current thread will continue to process a variety of messages.

Non-blocking
The concept of non-blocking corresponds to blocking, which means that the function will not block the current thread and return immediately before the result cannot be obtained immediately.
Object blocking mode and function calling
Whether the object is in blocking mode is highly correlated with whether the function is blocked or not, but it is not one-to-one. Blocking objects can have non-blocking calling methods.APIRound Robin To avoid blocking. For non-blocking objects, calling special functions can also be blocked. FunctionSelectThis is an example.

 

1. Synchronization means that I call a function and wait for the result before the function is completed.
2. asynchronous, that isI call a function,You do not need to know the results of this function. If this function has results, notify me (callback notification)
3. Blocking means calling me (function), and me(Function)I will not return it until the data is received or the result is not obtained.
4. Non-blocking means calling me(Function), Me(Function)Return immediately,Notify callers Through select

 

The difference between synchronous Io and asynchronous Io lies in:Whether the process is blocked during data copying!

The difference between blocking Io and non-blocking Io is as follows:ApplicationProgramIs the call returned immediately!

For a simple C/S Mode:

 

Synchronization: Submit a request-> wait for the server to process-> return after processing is complete. The client browser cannot do anything during this period.
Asynchronous: Request triggered by event-> server processing (this is what the browser can do)-> processing completed Both synchronous and asynchronous are only applicable to local sockets.

Synchronous and asynchronous, blocking, and non-blocking are mixed. In fact, they are not the same thing, and their modified objects are also different.
Blocking and non-blocking means whether the process needs to wait when the data accessed by the process is not ready,Simply put, this is equivalent to the difference in the implementation of functions., That is, whether to return directly or wait for the ready state;

Synchronous and asynchronous areData Access MechanismSynchronization generally refers to the method of actively requesting and waiting for the completion of I/O operations. when the data is ready, it must be blocked when reading and writing (the difference between the two phases is readiness and read/write, synchronous read/write must be blocked), asynchronous means that after actively requesting data, you can continue to process other tasks, and then wait for the I/O to complete the operation, this allows the process to read and write data without blocking. (Wait for "notification ")

Ii. Five I/O models in Linux

1) block I/O (blocking I/O)
2) non-blocking I/O (nonblocking I/O)
3) I/O multiplexing (select and poll) (I/O multiplexing)
4) signal-driven I/O (signal driven I/O (sigio ))
5) asynchronous I/O (the POSIX aio_functions ))

 

The first four are synchronous, and the last one is asynchronous Io.

Blocking I/O model:

Introduction: Process meetingAlways BlockingUntil the data copy is complete

An application calls an I/O function, leading to application blocking and waiting for data preparation. If the data is not ready, keep waiting .... When the data is ready, copy the data from the kernel to the user space, and the IO function returns a successful indication.

Blocking I/O model diagram:When calling the Recv ()/recvfrom () function, the process of waiting for data and copying data occurs in the kernel.

When calling the Recv () function, the system first checks whether there is prepared data. If the data is not ready, the system is waiting. When the data is ready, copy the data from the system buffer to the user space, and then return the function. In a nested application, when the Recv () function is called, data may not exist in the user space, and the Recv () function will be in the waiting state.

 

When you use the socket () function and wsasocket () function to create a socket, the default socket is blocked. This means that when the call to the Windows Sockets API cannot be completed immediately, the thread is waiting until the operation is completed.

Not all Windows Sockets API calls with the socket block parameter will be blocked. For example, when the BIND () and listen () functions are called using the socket in blocking mode as parameters, the function will return immediately. Call Windows Sockets APIs that may block sockets into the following four types:

1. Input operations: Recv (), recvfrom (), wsarecv (), and wsarecvfrom () functions. Call this function to receive data with the block socket as the parameter. If no data is readable in the socket buffer, the calling thread remains asleep until the data arrives.

2. Output operations: Send (), sendto (), wsasend (), and wsasendto () functions. Call this function to send data with the block socket as the parameter. If the socket buffer has no available space, the thread will sleep until there is space.

3. Accept connections: The accept () and wsaacept () functions. Call this function with the socket block parameter and wait for receiving the connection request from the other party. If there is no connection request at this time, the thread will enter sleep state.

4. Outbound Connections: connect () and wsaconnect () functions. For TCP connections, the client calls this function to initiate a connection to the server by taking blocking Socket as the parameter. This function does not return a response before receiving a response from the server. This means that the TCP connection always waits for at least one round-trip time to the server.

Using a socket in blocking mode makes it easy to develop network programs. When you want to be able to send and receive data immediately and process a small number of sockets, it is appropriate to use the blocking mode to develop network programs.

Insufficient Socket in blocking mode is manifested in difficulties in communication between a large number of established socket threads. When a network program is developed using the "producer-consumer" model, each socket is allocated a read thread, a processing data line, and a synchronization event, this will undoubtedly increase system overhead. Its biggest drawback is that it will not be able to handle a large number of sockets at the same time, and its scalability is poor.

Non-blocking Io Model

 

Introduction: non-blocking I/O functions are repeatedly called through processes ( Multiple system calls and immediate return ); During data copying, the process is blocked. ;

 


When we set a socket interface to non-blocking, we tell the kernel that when the requested I/O operation cannot be completed, do not sleep the process, but return an error. In this way, our I/O operation functions will continuously test whether the data is ready. If not, continue the test until the data is ready. In this continuous testing process, it will take a lot of CPU time.

SetSocketSet to non-blocking mode, that is, to notify the system kernel: when calling the Windows Sockets API, do not let the thread sleep, but let the function return immediately. The function returns an error.Code. As shown in the figure, a non-blocking mode socket calls the Recv () function multiple times. Kernel data is not ready when the Recv () function is called three times before. Therefore, this function immediately returns the wsaewouldblock error code. When the Recv () function is called for the fourth time, the data is ready and copied to the buffer zone of the application. The Recv () function returns a success instruction and the application starts to process the data.

When you use the socket () function and wsasocket () function to create a socket, it is blocked by default. After creating a socket, call the ioctlsocket () function to set the socket to non-blocking mode. In Linux, the function is fcntl ().
After the socket is set to non-blocking mode, when the Windows Sockets API function is called, The called function will return immediately. In most cases, these function calls call "fail" and return the wsaewouldblock error code. It indicates that the requested operation has no time to complete during the call period. Generally, the application needs to call the function repeatedly until the code is returned successfully.

It must be noted that not all Windows Sockets APIs are called in non-blocking mode and wsaewouldblock errors are returned. For example, this error code is not returned when the BIND () function is called using a socket in non-blocking mode as a parameter. Of course, this error code will not be returned when you call the wsastartup () function, because this function is the first function called by the application, and of course this error code will not be returned.

To set the socket to non-blocking mode, in addition to using the ioctlsocket () function, you can also use the wsaasyncselect () and wsaeventselect () functions. When this function is called, the socket is automatically set to a non-blocking mode.

The wsaewouldblock error is often returned when a function is called using a non-blocking socket. Therefore, at any time, you should carefully check the Returned Code and be prepared for "failure. The application continuously calls this function until it returns a successful response. In the above program list, the while loop continuously calls the Recv () function to read 1024 bytes of data. This is a waste of system resources.

To complete this operation, someone uses the msg_peek flag to call the Recv () function to check whether data in the buffer zone is readable. Similarly, this method is not good. This method causes a high overhead on the system, and the application must call the Recv () function at least twice to actually read data. A good practice is to use the "I/O model" of the socket to determine whether the non-blocking socket is readable and writable.

The non-blocking mode socket is not easy to use compared with the blocking mode socket. To use a non-blocking socket, you need to write more code to handle the wsaewouldblock error received in each Windows Sockets API function call. Therefore, non-blocking sockets are difficult to use.

However, non-blocking sockets control the establishment of multiple connections, uneven data sending and receiving volume, time is not scheduled, obviously has an advantage. This type of socket is difficult to use, but as long as these difficulties are ruled out, it is still very powerful in functionality. Generally, you can consider using the socket "I/O model", which helps applications manage the communication between one or more sockets in an asynchronous manner.

Io Reuse Model:

 Introduction: select and epoll are the main types. For an I/O port, two calls and two responses are not superior to blocking I/O; the key is to monitor multiple I/O ports at the same time;

The I/O reuse model uses select, poll, and epoll functions. These functions can also block processes, but they are different from those that block I/O, these two functions can block Multiple I/O operations at the same time. In addition, I/O functions of multiple read operations and write operations can be detected at the same time. I/O operation functions can be called only when data is readable or writable..

Signal-driven I/O

 

Introduction:Two calls and two responses;

First, we allow the set of interfaces for signal-driven I/O, and install a signal processing function, the process continues to run without blocking. When the data is ready, the process receives a sigio signal and can call the I/O operation function in the signal processing function to process the data.

Asynchronous Io Model

Introduction: The process does not need to be blocked during data copying.

When an asynchronous process is called, the caller cannot obtain the result immediately. After the call is completed, the caller's input and output operations are notified through status, notification, and callback.

Synchronization Io causes process blocking until Io operations are completed.
Asynchronous IO does not cause process blocking.
Io reuse is first blocked by the Select call.

 

Comparison of five I/O models:


 

 

Iii. Introduction to select, poll, and epoll

Both epoll and select can provide multiple I/O multiplexing solutions. In the current Linux kernel, epoll is unique to Linux, and select should be defined by POSIX.

 

Select:

In essence, select performs the next step by setting or checking the data structure that stores the FD flag. The disadvantage is:

1. The number of FD instances that can be monitored by a single process is limited, that is, the number of ports that can be monitored is limited.

Generally, this number has a lot to do with the system memory. You can check the specific number in CAT/proc/sys/fs/file-max. The default number of 32-bit hosts is 1024. The default value of the 64-bit host is 2048.

2. linear scanning is used for socket scanning, that is, polling is adopted, with low efficiency:

When there are many sockets, each select () request is scheduled by traversing fd_setsize sockets. No matter which socket is active, it is traversed. This will waste a lot of CPU time. If you can register a callback function for a socket and automatically complete related operations when they are active, the polling is avoided. This is exactly what epoll and kqueue do.

3. Maintain a Data Structure to store a large amount of FD data, which will cause high replication overhead when the user space and kernel space transmit the structure.

Poll:

Poll is essentially no different from select. It copies the input array to the kernel space and queries the device status of each FD, if the device is ready, add an item to the device waiting queue and continue the traversal. If no ready device is found after traversing all FD, the current process will be suspended until the device is ready or timed out, after being awakened, it will traverse FD again. This process has undergone many unnecessary traversal.

It has no limit on the maximum number of connections because it is stored based on a linked list, but it also has a disadvantage:

1. A large number of FD arrays are fully replicated between the user State and the kernel address space, regardless of whether such replication is meaningful. 2. Another feature of poll is "horizontal triggering". If FD is not processed after reporting, the next poll will report the FD again.

Epoll:

Epoll supports horizontal triggering and edge triggering. The biggest feature of epoll is edge triggering. It only tells the process which Fd has just changed to the desired state and will only notify once. Another feature is that epoll uses the event-ready notification method to register FD through epoll_ctl. Once the FD is ready, the kernel will use a callback mechanism similar to callback to activate the FD, epoll_wait will receive a notification

Advantages of epoll:

 

1. There is no limit on the maximum number of concurrent connections, The maximum number of FD ports that can be opened is greater than 1024 (0.1 million ports can be monitored in 1 GB of memory );
2. Improved efficiency It is not a round-robin method and will not decrease as the number of FD increases. Only the active and available FD calls the callback function;
That is to say, the biggest advantage of epoll is that it only manages your "active" connections, but it has nothing to do with the total number of connections. Therefore, in the actual network environment, epoll is much more efficient than select and poll. 3. memory copy The MMAP () file ing memory is used to accelerate message transmission with the kernel space. That is, epoll uses MMAP to reduce the replication overhead.

Summary of differences between select, poll, and epoll:

 

1. Support the maximum number of connections that a process can open

Select

The maximum number of connections that a single process can open is defined by the fd_setsize macro, which is the size of 32 integers (on 32-bit machines, the size is 32*32, similarly, the fd_setsize on 64-bit machines is 32*64). Of course, we can modify and re-compile the kernel, but the performance may be affected. This requires further testing.

Poll

Poll is essentially no different from select, but it has no limit on the maximum number of connections because it is stored Based on linked lists.

Epoll

Although the maximum number of connections is exceeded, connections of about 0.1 million can be opened on machines with 1 GB of memory, and connections of about 0.2 million can be opened on machines with 2 GB of memory.

2. Io efficiency problems arising from the dramatic increase in FD

Select

Because the connection is linearly traversed during each call, increasing FD will lead to a "linear performance degradation problem" with slow traversal speed ".

Poll

Same as above

Epoll

The epoll kernel is implemented based on the callback function on each FD. Only the active socket can actively call callback. Therefore, when the number of active sockets is small, epoll is used without the linear decline of the previous two. However, when all sockets are very active, performance problems may occur.

3. message transmission mode

Select

The kernel needs to copy messages to the user space.

Poll

Same as above

Epoll

Epoll is implemented by sharing a piece of memory with the user space through the kernel.

Summary:

To sum up, select, poll, and epoll should be selected based on the specific use cases and the characteristics of these three methods.

1. On the surface, epoll has the best performance, but when the number of connections is small and the connections are very active, select and poll may have better performance than epoll, after all, epoll's notification mechanism requires a lot of function callbacks.

2,Select is inefficient because it requires polling every time. However, inefficiency is relative, depending on the situation, and can also be improved through good design.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.