SLB Fault Diagnosis: a suspected case caused by an MSS Value

Source: Internet
Author: User

The server Load balancer device is located between the client and the Real Server. Once an access error occurs, the Server Load balancer device will become the primary suspect after a simple diagnosis. Customers generally question this question: Why can't I access the server directly? The question does make sense, but most things are often not as simple as two. Many things affect each other, which makes the truth confused.

A customer reports a fault one day, saying that when a service is accessed through the Server Load balancer device, the page cannot be opened or only some pages are opened after half a day. If the client directly accesses the server, the page can be opened smoothly.

Obviously, there must be a problem in the middle. Log on to the Server Load balancer device to check the configuration and log, and obtain some internal diagnostic information. No error is found. The only method is left: capture packets at the customer's site for analysis.

As a result, packet capture started, and at the same time captured the problematic service data packets and other non-problematic service data packets.

After analysis, it is indeed different. The following is the problematic packet capture content (packet capture 1 ):

 

10.52.127.108 is the client address.

10.0.1.112 is VIP

10.0.1.99 is the actual server address

Because it is deployed in Bypass mode, the source IP address needs to be converted. 10.0.1.123 is the client address (snat address) converted by the Server Load balancer device)

The VIP configuration of Server Load balancer is in HTTP mode, which means that the Server Load balancer device processes connections in proxy mode, that is, for each connection, the client first completes a three-way handshake with the Server Load balancer device, the server Load balancer device then performs a three-way handshake with the real server.

Access process:

1) 10.52.127.108 access 10.0.1.112

2) The Server Load balancer device performs three handshakes with the client.

3) The Server Load balancer device then converts the source IP address 10.52.127.108 to 10.0.1.123 and initiates a connection to the server 10.0.1.99.

4) The server 10.0.1.99 performs three handshakes with the Server Load balancer device.

Is to access the packet capture content (packet capture 2) of the normal service ):

 

10.0.76.2 is the client address.

10.0.1.113 is VIP

10.0.1.104 is the actual server address

Because it is deployed in Bypass mode, you also need to convert the client source IP address to 10.0.1.123

The access process is the same as packet capture 1.

After carefully comparing the two captured packets, we finally found the difference in the MSS value negotiation.

First, we will describe the MSS value association process when the Client accesses the Server:

  • When the client sends a SYN packet to the server, it carries the maximum MSS value that the client device can accept. This means that the content size of each packet sent from the server to the client cannot exceed this value.
  • The server replies SYN to the client. When the ACK packet is sent, the MSS value sent by the client is compared with the MSS value set by the client, return the minimum value of the two to the client as the maximum MSS value that you can accept. This means that the size of each packet sent from the client to the server cannot be greater than this value.
  • In actual transmission, both parties usually take the minimum value of the two as the maximum package size sent by both parties.

Based on the above communication process, we will analyze the above two packet capture content:

Packet Capture 1:

The client sends a SYN packet, indicating that the maximum acceptable MSS value is 1460. The Server Load balancer device responds with an acceptable MSS value of 1400. After successful negotiation, the package size for mutual interaction is no greater than 1400.

The server Load balancer device sends an acceptable MSS value of 1380 to the server, and the server responds to an acceptable MSS value of 120. After successful negotiation, the number of packets sent from the Server Load balancer device to the server cannot exceed 120.

The problem lies in the 120 MSS value negotiated with the server.

The size of the first request packet sent from the client to the Server Load balancer device is 905 bytes. The packet size is not greater than 1400, so the Server Load balancer device receives the packet, the server Load balancer device then sends the request to the selected server 10.0.1.99, because the server cannot receive more than 120 packets, therefore, the Server Load balancer device can only divide the request packets sent from the client into eight packets and send them to the server. Some uncontrollable problems arise. After the client sends a request packet, it needs to wait for a response, however, because the Server Load balancer device splits a package into eight packages, the interaction between the Server Load balancer device and the server takes longer. In this process, the client may time out and resend the request packet, the processing of the eight packets between the Server Load balancer device and the server may also cause packet loss, retransmission, Reinstallation, and other problems. The most important thing is that if the client sends an RST packet to close the connection after all requests for the connection are sent, the connection will be closed even if there is still content in the connection, because a request packet is divided into too many packets for transmission, once the client sends an RST packet, data transmission will basically fail, the preceding causes cannot be opened or cannot be fully opened.

We will analyze packet capture 2:

The client sends a SYN packet, indicating that the maximum acceptable MSS value is 1460. The Server Load balancer device responds with an acceptable MSS value of 1400. After successful negotiation, the package size for mutual interaction is no greater than 1400. This is the same as packet capture 1.

The server Load balancer device sends an acceptable maximum MSS value of 1380 to the server, and the server responds with an acceptable MSS value of 1380. After successful negotiation, therefore, both parties will communicate with each other with 1380 MSS values.

Whether it is between the client and the server Load balancer device or between the Server Load balancer device and the server, a request and a response can complete interaction without splitting packets, so there is no problem with packet capture 1.

In network communication, problems caused by improper MTU settings are not uncommon. For example, if the MTU of a device is set to 1500 when an ADSL device exists, client access may fail, this is because the ADSL PPPoE protocol occupies 8 bytes in the MTU, that is, the maximum MTU value of ADSL is 1492. If the client and server are large, the transmitted data packet exactly exceeds 1492 bytes, which will cause the data packet to fail. In the program design, the MSS value of the program is often the MTU-40 of the Local Machine (TCP and IP header each occupies 20 bytes, MTU is generally set to 1500 ), therefore, the maximum MSS value acceptable to all devices cannot be greater than 1500-40 = 1460. Therefore, considering that PPPoE, VPN, and other devices may occupy more MTU bytes in the network, therefore, network devices provided by network device manufacturers will further reduce the MSS value settings. Generally, the MSS value set by network devices is about 1400.

Obviously, the MSS value of about 1400 bytes is the normal value in network communication, so the server returns a 120-byte MSS value, which is abnormal, so the root cause of the problem is that the MSS value returned by the server is not suitable. Who returned this value? Is the server, that is to say, the value returned is the server, so we diagnose the cause of the problem on the server.

The next step is to check why the server returns this value, which is irrelevant to the Server Load balancer device. But there is still the value of tracking, because the server does not always return the value of 120, but sometimes it will negotiate to 1380. In this case, access is normal, and sometimes 120 is returned, at this time, access is not normal.

The customer's server is installed with the HP operating system and the application software is the Oracle ebs. After we locate the problem on the server, the customer also finds HP engineers to check and analyze the problem, however, the cause cannot be identified.

The cause of personal analysis problems may occur in the following aspects:

  • The MTU definition of the HP operating system or NIC driver has a variable value, or
  • The underlying communication program of Oracle ebs changes the MSS value according to some conditions during MSS value negotiation.

The above is just speculation, because no senior engineers from the above two manufacturers have been involved and cannot final locate the results, the problem becomes a suspect.

If you are familiar with the HP operating system and Oracle ebs or have encountered a similar case, you are welcome to discuss it together.

(Wyl)

This article is from the "ADC technology" blog

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.