Classic case study of MAC address drift

Source: Internet
Author: User
Tags network troubleshooting

Classic case study of MAC address drift

The network topology is as follows:

 

IRF virtualization is performed between the two carrier access switches (that is, two switches are virtualized into one), and VRRP hot backup is performed between the two Server Load balancer instances.

The network structure is Layer 2, and the gateways of each link are located at the carrier.

5800-2 port g2/0/11 is connected to the notebook 223.1.5.41

5800-1 port g1/0/3 is connected to mobile ISP (GATEWAY) 223.1.5.1

Problem:

A large number of users reflect the packet loss caused by ping to the mobile Gateway (223.1.5.1) on the mobile line server. The connection is often dropped and the network is unstable.

Now that the problem has occurred, we need to find the fault from the nearest network node. First, a laptop configured with a mobile IP (223.1.5.41) to the S5800-1 to ping the mobile gateway normal; indicating that the optical fiber link from the mobile operator is normal;

Then the notebook to the S5800-2, ping the mobile gateway packet loss, ping the following server normal. The problem lies in data packet loss between the S5800-2 and the S5800-1, and there is only one pair of Optical Fiber connecting the two S5800 for IRF, the problem may be out here, So I replaced the IRF optical fiber and optical fiber module. This is incredible, and the problem persists.

C: \ Users \ Administrator> ping 223.1.5.1-t

Pinging 223.1.5.1 with 32 bytes of data:

Reply from 223.1.5.1: byte = 32 time = 1 ms TTL = 254

Request timed out.

Request timed out.

Request timed out.

Reply from 223.1.5.1: byte = 32 time = 1 ms TTL = 254

Request timed out.

C: \ Users \ Administrator> arp-

Interface: 223.1.5.2 --- 0xb

Internet address physical address type

223.1.5.1 00-00-5e-00-01-65 news

223.1.5.4100-22-15-4c-5d-42 news

This is... impossible. The result of creating a mathematical model based on known conditions is unique. This logical error does not occur. There is only one pair of optical fiber for IRF between the two S5800 connections, and data transmission can only use this pair of IRF optical fiber. If there is no problem with the optical fiber and the optical fiber module, it can only indicate that the data is transmitted to the S5800-1 through the IRF optical fiber, a part of the switch is lost .........

 

Good! Let's make a traffic statistics to verify this situation:

 

Telnet 10.10.10.12 \ S5800 IP Address

Sys

Acl number 3876

Rule permit ip source 223.1.5.41 0 destination 223.1.5.1 0

Rule permit ip source 223.1.5.1 0 destination 223.1.5.41 0

Quit

 

Traffic classifier aaa

If-match acl 3876:

Quit

Traffic behavior aaa

Accounting packet

Quit

Qos policy aaa

Classifier aaa behavior aaa

Quit

Interface GigabitEthernet 2/0/11

Qos apply policy aaa inbound

Qos apply policy aaa outbound

Quit

Interface GigabitEthernet1/0/3

Qos apply policy aaa inbound

Qos apply policy aaa outbound

Quit

 

Test: We pinged 100 packets in the notebook 223.1.5.41 ping223.1.5.1-n 100 \ and only received 64 packets.

 

[5800] display qos policy interfaceGigabitEthernet 2/0/11

 

Interface: GigabitEthernet2/0/11

 

Direction: Inbound

 

Policy: aaa

Classifier: aaa

Operator: AND

Rule (s): If-match acl 3876.

Behavior: aaa

Accounting Enable:

100 (Packets)

 

Direction: Outbound

 

Policy: aaa

Classifier: aaa

Operator: AND

Rule (s): If-match acl 3876.

Behavior: aaa

Accounting Enable:

64 (Packets)

[5800] display qos policy interfaceGigabitEthernet 1/0/3

 

Interface: GigabitEthernet1/0/3

 

Direction: Inbound

 

Policy: aaa

Classifier: aaa

Operator: AND

Rule (s): If-match acl 3876.

Behavior: aaa

Accounting Enable:

64 (Packets)

 

Direction: Outbound

 

Policy: aaa

Classifier: aaa

Operator: AND

Rule (s): If-match acl 3876.

Behavior: aaa

Accounting Enable:

64 (Packets)

Packet Loss in the switch, that is, from the g2/0/11 port of the S5800-2 inbound direction to send 100 packets, to the g1/0/3 port of the S5800-1 outbound direction packet into 64. Where are the remaining 36 data packets? Is it true that the switch is lost inside 5800-1? Good! Let me take you inside the switch to see where the 36 data packets disappear.

[5800-1] en_diag \ enter the hidden Mode

[5800-1] debug port mapping 1 \ Display port corresponding internal port

[Interface] [Unit] [Port] [Name] [Combo?] [Active?] [IfIndex] [MID] [Link] [Attr]

========================================================== ==============================================

GE1/0/41 3 ge2no no 0x0/10 down Bridge

GE1/0/41 2 ge1no no 0x0/20 down Bridge

GE1/0/41 05 ge4 no no0x900002 4 upBridge

..

..

XGE1/0/41 26 xe0no no 0xbc00184 up Bridge

XGE1/0/0 027 xe1no no 0xbc00194 up Bridge

XGE1/0/42 28 xe2no no 0xbc001a4 up Bridge

XGE1/0/41 29 hg0no no 0xbc001b4 up Bridge

 

The port 5 of the switch is g1/0/3, and port 27 of the switch is XGE1/0/26.

Because the packet forwarding of the L2 Switch is only related to the MAC address, let's see where the MAC address 0x00005e000165 of the mobile gateway is. (You 'd better first learn the principles of the packet forwarding process of a layer-2 switch)

[5800-diagnose] bcm 1 0l2/conflict/mac = 0x00005e000165/vlan = 5

(Slot1) (Layer 2/conflict/mac/vlan)

Conflict: mac = 00: 00: 5e: 00: 01: 65 vlan = 5 modid = 4 port = 5/ge4 SDHit Group = Learnt

 

[5800-diagnose] bcm 1 0l2/conflict/mac = 0x00005e000165/vlan = 5

Conflict: mac = 00: 00: 5e: 00: 01: 65 vlan = 5 modid = 4 port = 5/ge4 SDHit Group = Learnt

 

[5800-diagnose] bcm 2 0l2/conflict/mac = 0x00005e000165/vlan = 5

(Slot2) (Layer 2/conflict/mac/vlan)

Conflict: mac = 00: 00: 5e: 00: 01: 65 vlan = 5 modid = 4 port = 5 SDHit Group = Learnt

 

[5800-diagnose] bcm 2 0l2/conflict/mac = 0x00005e000165/vlan = 5

Conflict: mac = 00: 00: 5e: 00: 01: 65 vlan = 5 modid = 4 port = 27 SDHit Group = Learnt

 

Note: A total of 4 tests, the first 2 is slot1 that is, in the s5800-1, the MAC address has not been drifting in port = 5;

The last 2 times is in the s5800-2, the MAC address has drift, one is port = 5, and the other is port = 27

Port = 5 (g1/0/3) port = 27 (XGE1/0/26) indicates that mac = 0x00005e000165 appears in g1/0/3 Ports (connected to mobile gateway) respectively in the S5800-2) and XGE1/0/26 ports (connected to the Server Load balancer-1 device ).

How does mac = 0x00005e000165 appear on the server Load balancer-1 device? Are all 36 packet loss packets on the server Load balancer-1 device?

Log on to the Server Load balancer-1 device and find that the virtual MAC address of a group of VRRP (VRID = 101) is actually mac = 0x00005e000165, which is the same as the MAC address of the mobile gateway, what is puzzling is that the configuration of the Server Load balancer device has not been changed for a year. But why does the mobile operator change the MAC address?

To avoid service impact, bind the MAC address of the mobile gateway immediately

Solution:

Bind the MAC of mobile gateway 223.1.5.1 to the g1/0/3 Port

Telnet10.10.10.12 \ log on to S5800

Interface GigabitEthernet1/0/3

Mac-address static packet -5e00-0165 vlan 5

 

I called the mobile operator and learned that the previous night, the mobile operator added another bras device in the data center and made the master-slave VRRP. The VRID was exactly 101, when VRRP is set up, the MAC address is not random, but from VRID 101 MAC = 2017-5e00-0165, VRID 102 MAC = 2017-5e00-0166 .......... And so on.

However, neither the BRAS device nor the Server Load balancer device has the vrrp method real-mac option to obtain the MAC address of the real interface, which leads to MAC address conflicts ........

Currently, many devices have VRRP hot backup, but they are not configured or do not support the real MAC address function.

Careful friends may have discovered that this is a vulnerability caused by VRRP that can affect large-scale network faults!

 

Some switch debugging and configuration commands are used in this article, which are hard to be found online, such as the configuration method of traffic statistics and Debugging commands in H3C hidden mode. You can learn from them.

I wrote this article to explain to my friends a network troubleshooting method, that is, the result of establishing a mathematical model based on known conditions is unique, the reason for the non-logical error is that the given known conditions are incorrect !!!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.