TCP connection survival at the backend of the server and tcp survival at the backend of the server
0. Background
The company's server backend is deployed in a certain place, connected to the user's APP, and the network signal in this place is poor, resulting in the failure of the user access after the server backend is running for a period of time, colleagues over there reported that netstat was used to view the system, and there were many TCP connections.
1. Problem Analysis
First, deploy the service on the internal testing server of the company and use LoadRunner for stress testing to run normally. Then, the colleagues from the company reported that the signal in this area is poor. Considering the access issue, the FD resources of the access process may be exhausted, resulting in an accept failure. The reasoning is that for TCP connections, if the client side fails to initiate a FIN close message to the server due to some exceptions, if the server side does not set a survival check, the connection will exist (the survival time is not tested yet ).
2. Experiment Test
Here, a simple program is written for the server. The main function is to respond, that is, accept a message (Format: 2 byte Packet Length + packet content), and then return the packet content to the client intact.
1 #include <stdio.h>
2 #include <sys/types.h>
3 #include <sys/socket.h>
4 #include <sys/epoll.h>
5 #include <unistd.h>
6 #include <pthread.h>
7 #include <stdlib.h>
8 #include <string.h>
9 #include <arpa/inet.h>
10
11 int g_epfd;
12
13 int InitServer( unsigned short port )
14 {
15 int nServerFd = socket( AF_INET, SOCK_STREAM, 0 );
16
17 struct sockaddr_in addr;
18 memset( &addr, 0, sizeof(addr) );
19
20 addr.sin_family = AF_INET;
21 addr.sin_port = htons( port );
22 addr.sin_addr.s_addr = 0;
23
24 if ( bind( nServerFd, (struct sockaddr *)&addr, sizeof(addr) ) <0 )
25 {
26 printf("bind error\n");
27 exit(-1);
28 }
29
30 if ( listen( nServerFd, 128 ) < 0 )
31 {
32 printf("listen error\n");
33 exit(-1);
34 }
35
36 return nServerFd;
37 }
38
39 int AddFd( int epfd, int nFd , int nOneShot)
40 {
41 struct epoll_event event;
42 memset( &event, 0, sizeof( event) );
43
44 event.data.fd = nFd;
45 event.events |= EPOLLIN | EPOLLRDHUP | EPOLLET;
46
47 if ( nOneShot ) event.events |= EPOLLONESHOT;
48
49 return epoll_ctl( epfd, EPOLL_CTL_ADD, nFd, &event );
50 }
51
52 int ResetOneShot( int epfd, int nFd )
53 {
54 struct epoll_event event;
55 memset( &event, 0, sizeof(event) );
56
57 event.data.fd = nFd;
58 event.events |= EPOLLIN | EPOLLRDHUP | EPOLLONESHOT;
59
60 return epoll_ctl( epfd, EPOLL_CTL_MOD, nFd, &event);
61 }
62
63 void * ReadFromClient( void * arg )
64 {
65 int nClientFd = (int)arg;
66 unsigned char buf[1024];
67 const int nBufSize = sizeof( buf );
68 int nRead;
69 int nTotal;
70 int nDataLen;
71
72 printf("ReadFromClient Enter\n");
73
74 if ( (nRead = read( nClientFd, buf, 2 )) != 2 )
75 {
76 printf("Read Data Len error\n");
77 pthread_exit(NULL);
78 }
79
80 nDataLen = *(unsigned short *)buf;
81 printf("nDataLen [%d]\n", nDataLen);
82 nDataLen = buf[0]*256 + buf[1];
83 printf("nDataLen [%d]\n", nDataLen);
84
85 nRead = 0;
86 nTotal = 0;
87 while( 1 )
88 {
89 nRead = read( nClientFd, buf + nRead, nBufSize );
90 if ( nRead < 0 )
91 {
92 printf("Read Data error\n");
93 pthread_exit( NULL );
94 }
95 nTotal += nRead;
96 if ( nTotal >= nDataLen )
97 {
98 break;
99 }
100 }
101 printf("nTotal [%d]\n", nTotal);
102
103 sleep(5);
104
105 int nWrite = write( nClientFd, buf, nTotal );
106 printf("nWrite[%d]\n", nWrite);
107
108 printf("Not Write ResetOneShot [%d]\n", ResetOneShot(g_epfd, nClientFd));
109
110 return NULL;
111 }
112
113 int main(int argc, char const *argv[])
114 {
115 int i;
116 int nClientFd;
117 pthread_t tid;
118 struct epoll_event events[1024];
119
120 int nServerFd = InitServer( 7777 );
121 if ( nServerFd < 0 )
122 {
123 perror( "nServerFd" );
124 exit(-1);
125 }
126
127 int epfd = epoll_create( 1024 );
128
129 g_epfd = epfd;
130
131 int nReadyNums;
132
133 if ( AddFd( epfd, nServerFd, 0 ) < 0 )
134 {
135 printf("AddFd error\n");
136 exit(-1);
137 }
138
139 while( 1 )
140 {
141 nReadyNums = epoll_wait( epfd, events, 1024, -1 );
142
143 if ( nReadyNums < 0 )
144 {
145 printf("epoll_wait error\n");
146 exit(-1);
147 }
148
149 for ( i = 0; i < nReadyNums; ++i)
150 {
151 if ( events[i].data.fd == nServerFd )
152 {
153 nClientFd = accept( nServerFd, NULL, NULL );
154
155 AddFd( epfd, nClientFd, 1 );
156
157 }else if ( events[i].events & EPOLLIN )
158 {
159 // Can be implemented by threadpool
160 //Read data from client
161 pthread_create( &tid, NULL, ReadFromClient, (void *)(events[i].data.fd) );
162
163 }else if ( events[i].events & EPOLLRDHUP )
164 {
165 //Close By Peer
166 printf("Close By Peer\n");
167 close( events[i].data.fd );
168 }else
169 {
170 printf("Some thing happened\n");
171 }
172
173 }
174 }
175
176 return 0;
177 }
Test content:
Note: Client IP: 192.168.10.108 Server IP & Port: 192.168.10.110: 7777
A. the client sends a message to the server and then disconnects the network. (Here I made some changes to the program. This experiment commented out the write Response to prevent the write impact on the test. The next experiment will use write ).
After the client is disconnected from the network, use netstat to check whether the client and server are still in the established status ,.
A. Experiment results
The server does not detect that the client is disconnected and is still in the connection status.
B. The client sends a message to the server, then breaks the network, closes the client, and repeats the message again.
In this test, if the program establishes a Socket connection again, the previous connection is detected.
B. experiment conclusion:
The connection will not be detected until the Program establishes a Socket connection again.
C. The client sends a message to the server and then disconnects the network. (This experiment uses the write Response to view the write results ).
The Write operation was successful .....
C. experiment conclusion:
This write operation does not check whether the peer end is disconnected.
3. Solution
Temporary: Use the TCP option SO_KEEPALIVE to check whether the client has been abnormal (setsockopt ).
Subsequent improvements: Use heartbeat packets to detect persistent connection survival issues.
Note: SO_KEEPALIVE will be added tomorrow. When I go home, only one notebook is installed with Ubuntu directly, and no virtual machine is installed, it will not hurt.
4. Supplement
If there is anything wrong or I suggest you directly talk about it, it is better to discuss it more.