HAProxy is especially suitable for websites with extremely high loads, which usually require session persistence or layer-7 processing. HAProxy runs on the current hardware and supports tens of thousands of concurrent connections. In addition, its running mode enables it to be easily and securely integrated into your current architecture, while protecting your web servers from being exposed to the network.
1. Problem description
After RMI is launched, the interfaces on the current network always report exceptions.
2. Problem analysis
Through understanding the RMI source code, this is when there is no available connection on the RMI client side, you need to create a new connection, but the connection fails.
Packet capture is usually possible to locate network problems, so packet capture finds that the failed connection has a common phenomenon, that is, HAProxy takes the initiative to close the connection in five seconds, considering that the configuration of HAProxy has a connectTimeout parameter of 5 seconds, it should be that HAProxy fails to connect to the back-end RMI server.
Packet capture also proves this because no SYN packets initiated from HAProxy to the backend (except the handshake caused by check) were found in five seconds ).
Therefore, we suspect that the problem lies in HAProxy. Otherwise, at least HAProxy should initiate an active connection.
In this case, we guess that HAProxy does not obtain the available server.
3. HAProxy positioning
3.1 connect (...)
I initially suspected that HAProxy didn't get the available server. So where can I solve the problem?
If HAProxy needs to establish a connection to the remote server, it must call the C language API connect (....)
In the function
You can see that connect (...) is called (...)
3.2 tcpv4_connect_server
View the call stack of tcpv4_connect_server
The above code is in the event_accept function. That is to say, when the client in the session is established, the session server connection function is specified, and the tcpv4_connect_server function is triggered somewhere later.
3.3 call of tcpv4_connect_server
It is clear here that the function is triggered by calling the connect_server function and then based on the previously specified connection function. Because the tcpv4_connect_server function is specified in 3.2, it is triggered, the connect function is called in the tcpv4_connect_server function, so you need to track the connect_server function.
3.4 call of connect_server
View the call stack,
Try to locate the problem through a similar call mechanism.
More detailed call stacks are not listed one by one.
3.4 modify source code to add custom logs
To locate the problem, modify the HAProxy [1.4.23] source code, add your own logs to each session creation and subsequent behavior, and add a unique session ID to each log line.
In this way, you can track the specific behavior of each session.
The log format is as follows:
3.5 srv_dynamic_maxconn
Through the log, we found that it was not because HAProxy could not find the available server, but that the current dynamic maxconn of the server was dynamically computed through this function.
Trace the code:
Unsigned int srv_dynamic_maxconn (const struct server * s, struct session * session)
{
Unsigned int max;
If (s-> proxy-> beconn> = s-> proxy-> fullconn)
{
/* No fullconn or proxy is full */
Max = s-> maxconn;
}
Else if (s-> minconn = s-> maxconn)
{
/* Static limit */
Max = s-> maxconn;
}
Else
{
Max = MAX (s-> minconn,
S-> proxy-> beconn * s-> maxconn/s-> proxy-> fullconn );
}
If (s-> state & SRV_WARMINGUP )&&
Now. TV _sec <s-> last_change + s-> slowstart &&
Now. TV _sec> = s-> last_change ){
Unsigned int ratio;
Ratio = 100 * (now. TV _sec-s-> last_change)/s-> slowstart;
Max = MAX (1, max * ratio/100 );
}
Return max;
}
Therefore, add logs in this code section and find that in the ant nest environment, each time this function returns 1.
So the problem is located, and 1 is returned, which leads,
The first request to establish a connection will be responded, and the subsequent 2, 3... Are rejected.
Check the log again to fully verify this point.
4 Solutions
Now that we know the problem, how can we solve it?
It must be solved through the logic of this function.
Check the srv_dynamic_maxconn function and find that there are two solutions in configuration.
1. Set minconn as a large parameter.
2. Directly set minconn to be the same as maxconn to completely remove the minimum limit. The concurrency is configured according to maxconn.
You can see in the code for the 2nd cases
That is, if the two are of the same size, max returns s-> maxconn. This is no problem.
5. Email exchange with the author of HAProxy
Since it is open-source software, you can directly communicate with the author.
Below is the email exchange with the author.
5.1 email description
5.2 reply from the recipient
5.3 resend the verification answer
So I sent my own answers to view the comments of the other party on our solutions, and did not forget to praise the popularity of the other party's software.
5.4 Final response of the other party
That is to say, the author thinks that it is better to directly remove the minconn parameter, so we are in haproxy. this parameter is removed from the configuration of cfg. Through log printing, the value of minconn is equal to that of maxconn, that is, the branch of static limit is adopted.
Now the problem has been solved, and the understanding of HAProxy is better than before.
Summary
When the source code is available, the first method we come up with is to debug or view the source code directly. In general, the problem can be solved. In linux, c uses gdb and java uses jdb to trace data row by row, which is very convenient and accurate! For open-source software, common problems have been encountered on the internet. We can find out if there are mature solutions.