Large-scale distributed C + + framework "FOUR: NetIO Request packet Broker"

Last Update:2016-04-17 Source: Internet

Author: User

Tags epoll log log

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

It should have been done in an article. The results have to be divided up and down. Mostly recently the cervical spine is very uncomfortable. And the need to do the second kill is also very busy. I can't sit here long. The time to look at the code is getting less. Then I bought two books for the treatment of the cervical spine. I'm learning, but it doesn't feel right. Suddenly my heart was so frightened. If the cervical spondylosis is getting heavier. How to go after the road.

Now you have to run and then sit for one hours to get up and activity. Then go and play badminton with them. It's almost 30 people. Now I find the body is real. Nothing else is interesting. Brothers have to pay attention to it too!

Don't say much nonsense. The following netio are described below.

NetIO is mainly a sub-package function in the system. NetIO is not capable of any business processing. Get the bag and make a simple deal. It is then sent to the corresponding business processing process according to the requested command word.

one, multi-process socket Epoll and "surprise group phenomenon" 2.1 Multi-process is the way to listen to the socket

1) For example, we want to create three processes to process a request under a port at the same time.

2) The parent process creates the socket first. And then listen. Note that this time the parent process Frok. 2 processes out. Plus a parent process is 3 processes 3) Each process creates a separate byte epoll_create and epoll_wait. And put the socket in the epoll_wait.
The above is the way the platform listens to the same port under multiple processes. Let's explore why we're doing this. 2.1.1, why to fork out the child process to inherit the parent process socket. Instead of multiple processes bound to the same port? first, if a port is being used, either time_wait, close_wait, or established state. This port cannot be reused, which naturally includes not being used for listen (listening). So in three processes, the listen and bind ports will definitely fail. 2.1.2, why can't I use SO_REUSEADDR to be a multi-process listener on the same port? first, let's look at the use of so_reuseaddr. Service End multiplicity will enter the state of time_wait when it is restarted. This time our port on bind will fail. But we can't wait for the time_wait state in the past to restart the service because time_wait may be more than a minute . This time we set to SO_REUSEADDR is to make the port in time_wait time, you can re-use monitoring. Note that so_reuseaddr is only useful when time_wait restarts the service. If you are more than one process to bind the same port. And the IP is the same. Then even if you set the so_reuseaddr, it will fail . because SO_REUSEADDR allows multiple instances of the same server to be started on the same port, as long as each instance bundles a different local IP address. For TCP, we simply cannot start multiple servers that bundle the same IP address and the same port number. The role of 2.1.3 and time_wait. Why are you time_wait?
because the TCP implementation must reliably terminate the two direction of the connection (full-duplex shutdown),One party must enter the TIME_WAIT state, as it may face a situation where the final ACK is re-issued. otherwise the RST will be sent 2.1.4, multiple implementation of the reason for listening to the same port because a child process is created, a copy of the socket resource is copied to the child process. In fact, we can understand this. In fact, only the parent process has a socket bound to the port. Other child processes are using only the cloned socket resource What happens when 2.1.5 and Epoll are put on the fork?
netio up to 5 processes

James2356     1  0  ,: Apts/0    xx:xx:xx./NetIO netio_config.xmljames2357  2356  0  ,: Apts/0    xx:xx:xx./NetIO netio_config.xmljames2358  2356  0  ,: Apts/0    xx:xx:xx./NetIO netio_config.xmljames2359  2356  0  ,: Apts/0    xx:xx:xx./NetIO netio_config.xmljames2360  2356  0  ,: Apts/0    xx:xx:xx./netio Netio_config.xml

Let's start with a few experiments and then analyze them.a) experiment oneNormal request. We are slowly sending 10 requests sequentially. Each request is a new request. Let's take a look at the process of netio processing PID discovery every time it's 2358. I thought I would compete with these five requests to get the socket received by the request. Then the child process for each request should be processed differently. But every time it's the same request.

 after epoll_wait pid:2358  after epoll_wait PID:  2358  after epoll_wait pid:  Span style= "color: #800080;" >2358  after epoll_wait pid:  2358  Span style= "color: #000000;" >after epoll_wait pid:  2358  after Epoll_ Wait pid:  2358  after epoll_wait pid:  2358  after epoll_wait pid:  2358  after epoll_wait pid:  2358  Span style= "color: #000000;" >after epoll_wait pid:  2358

b) Experiment twoLet's start with 2 requests. See the service processing process PID or 2358

After epoll_wait pid:2358afterepoll_wait pid:2358

At this time our client fork8 a process to request the service concurrently. Discovery 2357 and 2358 begin alternating processing

After epoll_wait pid:2358afterepoll_wait pid:2358afterepoll_wait pid:2357 after epoll_wait pid:2358afterepoll_wait pid:2357afterepoll_wait PID:  2358after epoll_wait pid:2357afterepoll_wait pid:2358

c) experiment threeWe're behind epoll_wait. Add sleep before recv (100000) and then send a request. Each process is found to be awakened but because sleep is blocked. It then wakes up other processes to process. Each wake up will be blocked. Until all 5 processes are blocked.

After epoll_wait pid:2358afterepoll_wait pid:2357after  epoll_wait pid:  2359after epoll_wait pid:2356after     epoll_wait pid:2360

d) experiment fourWe're behind the epoll_wait, recv. Sleep (100000) and send a request. Discover that there is a process recv after processing. Sleep. Other processes have not been awakened.

After Epoll_wait pid:2358

When we have two concurrent requests. Discovery wakes up two of processes

After epoll_wait pid:2357after  epoll_wait pid:2359

Four experiments have been done and now we're going to make a specific analysis of why. 1) Why only one process is processed at a time in the experiment one. First of all, our three processes are epoll_wait the same FD. At this point, it should actually wake up three processes to deal with. But each time there is only one process to deal with. If it is the process of competition processing. Other sub-processes should also have the opportunity to handle them. But no. This is what we call a "surprise group" phenomenon. But it didn't happen. Checked the information. The kernel does not use competition. Instead, assign policies. When there are multiple sub-processes epoll_wait the same FD. A child process is selected to process the message. Does not wake up other child processes. 2) When the concurrency increases in the second experiment, how many sub-processes begin to process. Actually, it's fun to do here. The kernel alternately wakes up the child process that listens for FD. If the child process finishes processing quickly. Then let this subprocess process FD. However, if the child process is not finished processing. The speed is not so fast. Will then wake up other sub-processes to process this FD. That is, when the FD event arrives. The kernel will first see if the a process is busy. If not busy. Let this a process be processed all the time. If a process is found busy. will go to see if process B is busy. If it is not busy then B processes the subsequent child processes and so on, so we see that when the concurrent request grows, we begin to have multiple subprocesses to process the 3) Three or 5 processes in the experiment why have they been awakened? Actually, that's what it says. Was sleep. We think that the process is in a busy state. Other processes are notified in turn. 4) Experiment four why only one process is awakened processingWe put the sleep behind the recv. Only one process was found to be awakened. We can have the task process accept and handle the task. All the other processes need not be notified. Here we summarize below: Epoll after fork in this way does not cause a surprise group phenomenon. Polling selects one of the child processes to process. If the child process is too late to process. Another child process is notified of the processing. But the above conclusions are obtained by doing experiments and checking data. And did not look at the kernel source. All if you have seen the core source of the students. Hope to be able to guide the next. what happens before 2.1.6 and Epoll are put on the fork? Put the epoll before the fork. When a request is sent, it is discovered that only one process is woken up for processing. This is actually the same as epoll after the fork. but there's a place where the eggs are sore. the UNIX domain is used by our system to do message notifications. When container finishes processing the message. The Uinx domain is sent to notify Neito to process the package back. But when it comes back to the bag. I don't know why 5 processes have been woken up to be processed. Finally, three processes have been actively concluded. Here's my own understanding. a) First is the process ID of 6000 data processed by the NetIO. when the container returns the actual data will only notify f000_6000 in the Uinx domain such as. There are 5 of processes. There are 5 Uinx domains b) but because of the multi-process common one epoll. Other processes have also been awakened. It is then determined that this FD is the type of the uxin domain. then you will read your own UNIX domain files. but there's no news. Container only returned to the f000_6000 so that other processes have beenRecvfrom =-1 and because it is being processed f000_6000 process is not timely enough. The news didn't deal with it. The Epoll feature is that the process is always notified of processing. So other processes will read their UNIX domains all the time. And then he always recvfrom =-1 . such as. No stickers. In addition to process 6000 other processes print a bunch of such information c) finally we read process 6000 from the Uinx domain after reading the data. The rest of the process has been processed just at this time to get FD. This time to deal with this fd is undefined. and we are directly to the stop process for undefined FD. So the last three processes were actively shut down. here, let's summarize . because there is no kernel code to look at all this is only by experimentation and guessing ..... To kneel and beg the great God for guidance 1, first for the TCP port core should be made special processing. So epoll before or after the fork. If the processing is timely. It should be that only one process is awakened. To deal with. Processing does not occur in time and in turn wakes up other processes. Does not cause a surprise group phenomenon (that is, the critical resource of multiple processes to rob this package.) Finally, only one process can grab the packet. But the experiment found that it was not a competitive relationship. 2, but FD for UNIX domain. The common one Epoll no special handling. it can cause a group of surprises. and multiple processes can get this FD to deal with. Second, the netio timer

Look first. Where the NetIO timer is located. Since Epoll_wait was 10 milliseconds. Whether or not a request is triggered. Polling occurs every 10 milliseconds. This prevents when container notifies netio. A situation in which a message is lost that causes netio not to be processed is a more important concept. Each service process will have a timer to handle the scheduled task. The following NetIO timer is introduced here. Check the timing events here for two things. One is to find out what time events have arrived. and executes the specific handler function of the subclass. The second one is to renew the automatic time event. NetIO the initialization time. Registers a 60-second cycle-time event. That is, a time event is executed every 60 seconds. This time event has the following actions. 1) Clear Timeout Socket 2) query local command word list 3) Timing output NetIO Statistics 3.1 Data structure of the timerThe NetIO timer data structure is the smallest heap. That is, the most recent scheduled task is on the smallest heap a) A 65-size two-dimensional array was requested

Const 1  -  new Cnode*[default_queue_len];

Why apply for 1+64 here? Actually only 64 are available. One of the pointers is used to assist the minimum heap algorithm. i.e. m_pnodeheap[0] = NULL; is always pointing to empty. The minimum value for the minimum heap is M_pnodeheap "1". b) Significance of Cnode member variables

structcnode{CNode (Itimerhandler*ptimerhandler = NULL,intItimerid =0): M_ptimerhandler (Ptimerhandler), M_itimerid (Itimerid), M_dwcount (0), benable (true) {} Itimerhandler*M_ptimerhandler; intM_itimerid;       Ctimevalue m_tvexpired; //timevalue for first checkCtimevalue M_tvinterval;//Time Check intervalUnsignedintM_dwcount;//Counter for Auto re-schedule        BOOLbenable;};

M_ptimerhandler    is primarily used to hold the parent class pointer. When the event was triggered. The inheritance class is found through the parent class pointer. To handle a specific   time event m_tvexpired        record expiration time. For example, an event expiration time is 10 seconds. Then m_tvexpired is the current time of deposit +10 seconds this value. Each time the comparison. This value of the smallest heap is compared to the current time. If the current time is less than m_tvexpired the description does not have any time to be triggered. If the current time is greater than this value. It is considered necessary to process the time event M_dwcount          This is used to set the number of automatic expiration times. For example, we have a time event. We want it to execute three times. Each interval is 1 minutes. Then this value is set to 3. When the time is first reached. We found that this value is greater than 0. Then re-enter the minimum heap for the current time of the m_tvexpired assignment +1 minutes  . Then M_dwcount minus one. The next time this is still done. Until M_dwcount this value to 0. We don't think we need to automatically set a timed task to this time. benable            This value is used to determine whether the event is still valid. The practice here is very interesting. When a time event finishes executing. Or when it's not needed. Let's set it to false first. Wait for the next check time. If this time event is found to be invalid. This time in Deletem_tvinterval       This is the time interval of the event. For example, my time event is executed every 10 seconds. This value is set to 10 seconds M_itimerid         This is the unique ID of the time event the                   value is self-increasing. A new time event for each. M_itimerid will be +1.  Because of the time of initialization. This value is 1. So the first time event M_itimerid has a value of 2.

3.2 timed cleanup of invalid connectionsA) first if there is a client connet in. NetIO will save this connet FD and the time of arrival. M_maptcphandle[itcphandle] = (int) time (NULL), b) The data is updated each time C) NetIO has a configuration file. We are generally set to 10 seconds. Each time a timed event is checked. Will go to check the m_maptcphandle. Compare the current time with the time saved in M_maptcphandle. When we find more than 10 seconds, we think this connection is invalid. The FD will then be closed. and delete D so if you want to keep a long connection. The client is required to send the heartbeat packet continuously. To update this time 3.3 Statistical information of the timing statistics NetIO There's a very interesting place here. at the time of initialization. We have registered a time event that executes the netio state of the output every 60 seconds we talked about M_dwcount. This value is used to control the number of automatic time events. We output netio status information every minute. This is a fixed time event. We enter the parameter dwcount set 0. Then the system thinks you need an infinite loop for this time event. a maximum value (unsigned int)-1 is set. This value is calculated as 4294967295. we calculate it at once per minute. It will take about more than 8,000 years to reduce this m_dwcount to 0~~~~~ .

if 0 )        m_l_pnode->m_dwcount = dwcount;     Else         m_l_pnodeint)-1;

When the status information is finished. Will initialize some of the state of NetIO to 0. Because our status output is statistical one minute state. For example, the number of request packets for this minute. Ratio. Number of lost packets. 3.4 timed request command wordIt's still more important here. For example, we added a new service. If you do not restart NetIO, you do not know. But we're going to have to ask for all the command words every other minute. You can find the service added in the heart. Three, netio of the log analysis

4.1. Netio_debug.log Log AnalysisThe following is an analysis of the simple netio log. Here is a general introduction. Not specifically in the introduction of the value of NetIO. See more and know what it means. is still asking for the road container. container found that the parameter checksum is not returned directly to the packet back to NetIO queue such as a) 192.168.254.128:58638 Print the requested IP and Port B when the request arrives handle = 00700008 But this socket is not native. is processed. c) connnum = 1 currently has multiple connections D) timestamp = 1460881087 request to timestamp such as a) sendmsgq request start This is to drop the content into message Queue B) _NotifyNext Request start This is the Uinx domain that informs container. I've got a message in your message queue. c) onrecvfrom request start Here is the Uinx domain received by container. Tell NetIO that I have finished the work. Throw it in your back packet message queue. You take care of it. D) oneventfire request:0 Get the number from the message queue And begin to deal with it. Back to client package e) Onclose request start client send close Socket signal. After receiving the service end. Close socket 4.2 netio_perform.log log parsing The following are the statistics of a clock pkgrecv The received package Pkgsent & nbsp to send out packages errpkgsent & nbsp The wrong package. Pkgpushfail This is temporarily useless to pkgsendfail This is when the NetIO package is sent. Number of BYTESRECV Bytes received bytessent &NBSP ; The number of bytes sent out maxconn max connections. This value is not the maximum value within a copy. is from start to output statistics is. Highest simultaneous connection data tcpconntimeout because of timeouts. NetIO automatically shuts down the TCP connection. &NBSP;CMD[0X20630001] is netio from the return packet queue. Command word count[15] This command word total number of packages received in a minute averagetime[0] & nbsp Average processing time per package. Here is a total of 15 packages taken from Netio-container-netio this periodTime divided by 15 the average time unit is milliseconds maxtime[1] This 15 package takes the longest package. Elapsed time averagersplen[89] bytes per packet back to client maxrsplen[89] Max Packet bytes ratio[100] There's a one-minute netio to accept the total number of packages (this refers to the request package from the client) . Then divide the number of 0x20630001 command words by the total number of packets and multiply by 100. Get 0x20630001 within this minute. The proportion of the processing package accounted for. The next string is the number of command-word 0x20630001 that are distributed at different times. The last day is positive for one minute all command-word packets are counted

finally:

1) for large systems. Statistical logging is important. Can understand the state of the system in current affairs

2) Be sure to deal with a lot of process relationships

3) Finally, be sure to protect your body. The body is the fundamental AH ~ ~ ~

Large-scale distributed C + + framework "FOUR: NetIO Request packet Broker"

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More