HAProxy ten-million concurrent problem solution

Source: Internet
Author: User
Tags session id haproxy

HAProxy is especially suitable for websites with extremely high loads, which usually require session persistence or layer-7 processing. HAProxy runs on the current hardware and supports tens of thousands of concurrent connections. In addition, its running mode enables it to be easily and securely integrated into your current architecture, while protecting your web servers from being exposed to the network.


1. Problem description

After RMI is launched, the interfaces on the current network always report exceptions.


2. Problem analysis

Through understanding the RMI source code, this is when there is no available connection on the RMI client side, you need to create a new connection, but the connection fails.

Packet capture is usually possible to locate network problems, so packet capture finds that the failed connection has a common phenomenon, that is, HAProxy takes the initiative to close the connection in five seconds, considering that the configuration of HAProxy has a connectTimeout parameter of 5 seconds, it should be that HAProxy fails to connect to the back-end RMI server.

Packet capture also proves this because no SYN packets initiated from HAProxy to the backend (except the handshake caused by check) were found in five seconds ).

Therefore, we suspect that the problem lies in HAProxy. Otherwise, at least HAProxy should initiate an active connection.

In this case, we guess that HAProxy does not obtain the available server.

3. HAProxy positioning

3.1 connect (...)

I initially suspected that HAProxy didn't get the available server. So where can I solve the problem?

If HAProxy needs to establish a connection to the remote server, it must call the C language API connect (....)

In the function


You can see that connect (...) is called (...)


3.2 tcpv4_connect_server

View the call stack of tcpv4_connect_server



The above code is in the event_accept function. That is to say, when the client in the session is established, the session server connection function is specified, and the tcpv4_connect_server function is triggered somewhere later.

3.3 call of tcpv4_connect_server


It is clear here that the function is triggered by calling the connect_server function and then based on the previously specified connection function. Because the tcpv4_connect_server function is specified in 3.2, it is triggered, the connect function is called in the tcpv4_connect_server function, so you need to track the connect_server function.

3.4 call of connect_server

View the call stack,


Try to locate the problem through a similar call mechanism.

More detailed call stacks are not listed one by one.

3.4 modify source code to add custom logs

To locate the problem, modify the HAProxy [1.4.23] source code, add your own logs to each session creation and subsequent behavior, and add a unique session ID to each log line.


In this way, you can track the specific behavior of each session.

The log format is as follows:


3.5 srv_dynamic_maxconn

Through the log, we found that it was not because HAProxy could not find the available server, but that the current dynamic maxconn of the server was dynamically computed through this function.

Trace the code:

Unsigned int srv_dynamic_maxconn (const struct server * s, struct session * session)
{   

Unsigned int max;

If (s-> proxy-> beconn> = s-> proxy-> fullconn)
{
/* No fullconn or proxy is full */
Max = s-> maxconn;
}
Else if (s-> minconn = s-> maxconn)
{    
/* Static limit */
Max = s-> maxconn;
}
Else
{
Max = MAX (s-> minconn,
S-> proxy-> beconn * s-> maxconn/s-> proxy-> fullconn );
}

If (s-> state & SRV_WARMINGUP )&&
Now. TV _sec <s-> last_change + s-> slowstart &&
Now. TV _sec> = s-> last_change ){
Unsigned int ratio;
Ratio = 100 * (now. TV _sec-s-> last_change)/s-> slowstart;
Max = MAX (1, max * ratio/100 );
}
Return max;
}

Therefore, add logs in this code section and find that in the ant nest environment, each time this function returns 1.

So the problem is located, and 1 is returned, which leads,

The first request to establish a connection will be responded, and the subsequent 2, 3... Are rejected.

Check the log again to fully verify this point.

4 Solutions

Now that we know the problem, how can we solve it?

It must be solved through the logic of this function.

Check the srv_dynamic_maxconn function and find that there are two solutions in configuration.

1. Set minconn as a large parameter.

2. Directly set minconn to be the same as maxconn to completely remove the minimum limit. The concurrency is configured according to maxconn.

You can see in the code for the 2nd cases


That is, if the two are of the same size, max returns s-> maxconn. This is no problem.

5. Email exchange with the author of HAProxy

Since it is open-source software, you can directly communicate with the author.

Below is the email exchange with the author.

5.1 email description


5.2 reply from the recipient



5.3 resend the verification answer

So I sent my own answers to view the comments of the other party on our solutions, and did not forget to praise the popularity of the other party's software.


5.4 Final response of the other party


That is to say, the author thinks that it is better to directly remove the minconn parameter, so we are in haproxy. this parameter is removed from the configuration of cfg. Through log printing, the value of minconn is equal to that of maxconn, that is, the branch of static limit is adopted.

Now the problem has been solved, and the understanding of HAProxy is better than before.

Summary

When the source code is available, the first method we come up with is to debug or view the source code directly. In general, the problem can be solved. In linux, c uses gdb and java uses jdb to trace data row by row, which is very convenient and accurate! For open-source software, common problems have been encountered on the internet. We can find out if there are mature solutions.


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.