Original intention:
At the beginning of 2011, when I learned about the switch link aggregation technology, I had a question in my mind:
The switch receives a message, arrives at the aggregation mouth, how chooses the member mouth to go out?
Was it a random selection? or by the number of members from small to large or from the big round? If you choose the source IP algorithm, how to achieve a different source IP to go different ports? At that time on the Internet can not find the introduction, this question, until the end of 2011 to do test engineers, only to get research and development of the brother's answer, has been in mind, always want to find a time to publish online, so that more people know, the original load balancing algorithm is really so simple!
The main role of link aggregation is to increase bandwidth, increase reliability and prevent two-layer loops. Here, I do not discuss why this technology and what to use, just say the message to the aggregation port, how to select the port.
Sigh:
At the end of 2012, when testing a switch project, and another company (is a large enterprise) is a competitive relationship, I am responsible for testing the company's switches (small and medium-sized Enterprises), the name is not said.
In the network test, the Ministry of Industry testers to introduce real traffic, divided into two, respectively, into the two manufacturers of the switch, the switch is configured with the same load sharing algorithm (such as: all are SIP hash), the same aggregation group members (such as: 32). The test results are: true flow 10G Two minutes, two manufacturers each member interface output message exactly the same (two manufacturers to determine the load balancing algorithm is exactly the same). From this I just deeply sigh, the original good manufacturers use the algorithm is not so much (always thought big enterprise must be very good, the fact that the final Test results report not as well as US).
Hash Table Introduction:
Within the switch, each time an aggregation group is created, the underlying layer creates a hash table of the group that should be aggregated, which exists on the swap chip, and the hash table reads as follows (Simplified):
Left column index for the hard support of chips, is now generally 256,512,1024, higher than ever seen. The higher the index number, the more balanced the load sharing.
Here is an example of 3 members:
Index |
Interface |
0 |
Eth0_0 |
1 |
Eth0_1 |
2 |
Eth0_2 |
3 |
Eth0_0 |
4 |
Eth0_1 |
5 |
Eth0_2 |
. |
. |
. |
. |
. |
. |
1022 |
Eth0_0 |
1023 |
Eth0_1 |
Hash Table maintenance:
The switch has a dedicated thread that detects the effective members of the aggregation group in real time, and immediately refreshes the hash table entry once the Member State has changed.
By the way talk about refreshing the hash table this technique.
The engineer Up/down member port, the bottom must refresh the member in real time (here compares the test manufacturer technology), the refresh speed is slower, the Member State changes when the packet is dropped more. The most powerful technology, such as Cisco, can do up/down member mouth, do not lose the packet. And my company will initially lose a second bag (research and development design thinking problem). Later optimized to reach the Up/down member port, there are 0.0 seconds of packet loss, can not do not lose the package.
Up/down Analysis: When an engineer Up/down an aggregation group member on the command line, the underlying table item will have a tiny response time to refresh the table entry, and this trip time, the already down interface still exists in the hash, and the message is always there, Packets that are just being hashed into this invalid outbound port are discarded! )
Switch load Balancing Forwarding principle:
Even though the bottom has a hash table, how exactly is the use of this form?
1 engineers set port members and hash algorithm, such as SIP, DIP, Sip+dip, SIP+DIP+SP+DP, etc.
2 The switch generates the hash table according to the member, and extracts the corresponding content in the message according to the algorithm.
3 using a specific hash value of the calculation method, the extracted content to calculate a 10bits value.
4 Find out the corresponding port in the underlying hash table entry.
5 forward the message from this port.
Calculation method of hash value:
XOR is exclusive OR operation, that is, two values are different, the difference or result is true, otherwise, false. The difference is 1, the same is 0.
1, SIP (source IP)
1 SIP XOR 0 Gets a value of 32bit.
2 and then make the high 16bits and low 16bits xor.
3 with 16bits of 15-12bits and 11-8bitsxor, the 4bits will be replaced to 11-8bits, get 12bits right shift 2 bit to get 10bits hash value
Note: The value of 10bits must be a number in 0-1023, and the corresponding interface of the index is forwarded out from the interface. (The same IP must be the same hash value)