Analysis of Linux Bridge

Source: Internet
Author: User
Tags switches

Ext.: http://www.cnblogs.com/morphling/p/3458546.html

What is bridging?

Simply put, bridging is the "connection" of several network interfaces on a single machine. As a result, one of the messages received by one of the network ports is copied to the other network ports and sent out. So that the messages between the network ports can be forwarded to each other.
The switch is such a device, it has a number of network ports, and these network ports are bridged. As a result, several hosts connected to the switch can communicate with each other through the message forwarding of the switch.

Such as: The message sent by host A is sent to the eth0 port of the switch S1, because eth0 and eth1, Eth2 Bridge together, so the message is copied to eth1 and eth2, and sent out, and then by Host B and switch S2 received. And S2 will forward the message to host C, D.


The switch does not tamper with the message data during the message forwarding process, but copies it as is. However, bridging is not implemented at the physical level, but at the data link layer. The switch can understand the data link layer of the message, so the bridge is actually not pure message forwarding.
The switch takes care of the MAC address information (including the source and destination addresses) that fills in the header of the data link layer of the message, to see where each MAC address represents the host (which is connected to which port on the switch). In the case of message forwarding, the switch only needs to forward to a specific network port, thus avoiding unnecessary network interaction. This is the "address learning" of the switch. However, if the switch encounters an address that it has not learned, it will not know which port the message should be forwarded from, then it has to forward the message to all network ports (except the one that receives the message).
For example, host C sends a message to host a, and the message arrives at the ETH2 network port of the switch S1. Assuming S1 has just started and has not learned any addresses, it forwards the messages to eth0 and eth1. At the same time, S1 will be based on the source MAC address of the message, recording "Host C is through the ETH2 network port access." So when host a sends a message to C, S1 only needs to forward the message to the ETH2 network port. When the host D sends a message to C, it is assumed that the switch S2 the message to the S1 's eth2 network (in fact S2 will probably not do so because of the address learning), then the S1 will simply discard the message without doing forwarding (because the host C is from the eth2 access).

However, the network topology cannot be immutable. Suppose we will host B and host C in a different location, when the host C sent a message (regardless of who sent), the switch S1 eth1 Port received the message, so the switch S1 will update its learning address, the original "host C is through the ETH2 network port access" to "host C is through the ETH1 network port access."
But what if host C never sends a delivery paper? S1 will always think "host C is through the ETH2 network port access", so the other host sent to C messages are forwarded from ETH2, the results of the message was lost. Therefore, the address learning of the switch requires a timeout policy. For switch S1, if the last message received from host C has elapsed (by default, 5 minutes), then S1 needs to forget that "host C is connected through the eth2 port" thing. In this way, the message destined for host C will be forwarded to all network ports, and the messages forwarded from ETH1 will be received by host C.

bridging implementations of Linux

Related Models
The Linux kernel supports the bridging of the network port (Ethernet interface is currently supported only). However, unlike a simple switch, a switch is a two-tier device that either forwards or discards the received message. Small switches only need a piece of exchange chip, do not need the CPU. The machine running the Linux kernel itself is a host, possibly the destination of the network message. In addition to forwarding and discarding the messages received, they may also be sent to the upper layer of the Network protocol stack (the network layers) and thus be digested by themselves.
The Linux kernel is bridged through a virtual bridge device. This virtual device can bind several Ethernet interface devices to bridge them. As (excerpt from Ulni):


The bridge device Br0 binds eth0 and eth1. For the upper layer of the network protocol stack, only see the Br0, because the bridge is implemented at the data link layer, the upper layer does not need to care about bridging details. So the packets that need to be sent on the upper stack are sent to the br0, and the processing code of the bridge device is then judged to be forwarded to eth0 or eth1, or both; in turn, messages received from eth0 or from ETH1 are submitted to the processing code of the bridge, where it is judged that the message is forwarded, discarded, or submit to the upper layer of the protocol stack.
And sometimes eth0, eth1 may also be the source address or destination address of the message, directly participate in the transmission and reception of the message (thus bypassing the bridge).

Related data Structures
To use the bridging feature, we need to specify the relevant options when compiling the kernel and let the kernel load the bridging module. A new bridge device is then added through the "Brctl ADDBR {br_name}" command and finally binds several network interfaces through the "Brctl addif {eth_if_name}" command. After you complete these operations, the data structure relationships in the kernel are as shown (excerpt from ULNI):


The leftmost net_device is a virtual device structure that represents a bridge, which is associated with a net_bridge structure, a data structure unique to a bridge device.
In the Net_bridge structure, a linked list is hung under the Port_list member, and each node in the list (Net_bridge_port structure) is associated to the net_device of a real-world network-port device. The network port device is also associated with its br_port pointer (so obviously, a network port can only be bound to one bridge at a time).
The Net_bridge structure also maintains a hash table, which is used to process address learning. When the bridge is ready to forward a message, the destination MAC address of the message is key, if it can be indexed in the hash table to a net_bridge_fdb_entry structure, through this structure can find a network port device Net_device, The message should then be forwarded from the network port, otherwise the message will be forwarded from all the network ports.

Receive Process
In the article "Analysis of sending and receiving of Linux network Messages", we can see that the message received by the network port device is finally received by the network protocol stack via the NET_RECEIVE_SKB function.

NET_RECEIVE_SKB (SKB);
This function mainly does three things:
1, if there is a packet capture process need SKB, will SKB copy to them;
2, processing bridge connection;
3, the SKB submitted to the network layer;

Here we are only concerned with the 2nd step. So, how to determine whether a SKB need to do bridge-related processing? Skb->dev points to the device that receives the SKB, if the Net_device br_port is not empty (it points to a net_bridge_port structure), the net_device is being bridged and Net_bridge The BR pointer in the _PORT structure can find the net_device structure of the bridge device. Then call to the Br_handle_frame function, let the bridge code to handle this message;

Br_handle_frame (Net_bridge_port, SKB);
If the destination MAC address of the SKB is the same as the MAC address of the network port that received the SKB, end the bridging process (the SKB will eventually be submitted to the network layer after returning to the NET_RECEIVE_SKB function);
Otherwise, call to the Br_handle_frame_finish function to forward the message, and then release the SKB (return to the NET_RECEIVE_SKB function, the SKB will not be submitted to the network layer);

Br_handle_frame_finish (SKB);
First, update the address of the bridge device through the Br_fdb_update function to learn the record of the source MAC address in the hash table corresponding to SKB (update timestamp and the NET_BRIDGE_PORT structure it points to);
If the destination address of the SKB is the same as the MAC address of the other network port of the native (but not the same as the MAC address of the network port receiving the SKB), the BR_PASS_FRAME_UP function is called when the function will skb-> Dev replaces the bridge device with Dev, and then calls NETIF_RECEIVE_SKB to process the message. This NETIF_RECEIVE_SKB function is called recursively, but this time it will no longer trigger the related processing function of the bridge, because Skb->dev has been replaced, Skb->dev->br_port is already empty. So this time the NETIF_RECEIVE_SKB function will eventually submit the SKB to the network layer;
Otherwise, through the __br_fdb_get function in the Address learning hash table of the bridge device to find the SKB's destination MAC address of the corresponding dev, if found (and through its timestamp to determine that the record is not expired), then call Br_forward to forward the message to the Dev , and if it is not found, call Br_flood_forward for forwarding, which traverses the port_list in the bridge device, finds each bound dev (except for the same one as Skb->dev), and then calls Br_forward to forward it;

Br_forward (Net_bridge_port, SKB);
Replace the Skb->dev with the dev that will be forwarded, and then call Br_forward_finish, who will call Br_dev_queue_push_xmit.
In the end, Br_dev_queue_push_xmit will call Dev_queue_xmit to send the message (see "Analysis of how Linux network messages receive and send"). Notice that at this point the Skb->dev has been replaced with the forwarded dev, and the message is forwarded from the network port;

Send Process
In the article "Analysis of sending and receiving of Linux network messages", we see that the Dev_queue_xmit (SKB) function is called when the upper layer of the protocol stack needs to send a message. If the message needs to be sent through a bridge device, then Skb->dev points to a bridge device. The bridge device does not use the Send queue (Dev->qdisc is empty), so Dev_queue_xmit will call the Dev->hard_start_xmit function directly, and the hard_start_xmit of the bridge device equals the function br_dev_ XMit

Br_dev_xmit (SKB, Dev);
Through the __br_fdb_get function in the Address learning hash table of the bridge device to find the SKB's destination MAC address corresponding to the dev, if found, call Br_deliver send the message to this dev, and if not found, call Br_flood_ Deliver is sent, the function traverses the port_list in the bridge device, finds each bound dev, and then calls Br_deliver to send it (the logic here is much like the previous forwarding);

Br_deliver (Net_bridge_port, SKB);
The logic of this function is similar to the br_forward that was called when it was forwarded. Replace the Skb->dev with the dev that will be forwarded, and then call Br_forward_finish. As mentioned earlier, Br_forward_finish is called to Br_dev_queue_push_xmit, which eventually calls Dev_queue_xmit to send the message out.

The above procedure ignores the processing of a broadcast or multicast MAC address, and if the MAC address is a broadcast or multicast address, forward the message to all the bundled dev.

In addition, for the expiration record of address learning, there is a timer that periodically calls the Br_fdb_cleanup function to clear them.

Spanning Tree Protocol

For the bridge, the message forwarding, address learning is actually very simple things. In a simple network environment, this is enough.
For complex network environment, it is necessary to make some redundancy to the data path, so that when a switch in the network fails, or a network port of the switch fails, the whole network can be used normally.
So, let's say we add a redundant connection to the network topology above to see what happens.


Assuming that both the switch S1 and S2 are just booting (without learning any addresses), host C sends a message to B. The eth2 port of the switch S2 receives the message and forwards it to eth0, eth1, Eth3, and records "Host C is accessed by Eth2". The switch S1 receives messages at its eth2 and Eth3 ports, and the messages received by the ETH2 Port are forwarded from the ETH3 (and other ports), and the messages received from the ETH3 port are forwarded from the ETH2 (and other ports). So the switch S2 eth0, eth1 Port will again receive this message, the source address of the message or host C. So S2 updated to learn the address, recorded "Host C by eth0 Access", and then updated to "Host C by eth1 access." Then the message continues to be forwarded to the switch s1,s1 and forwards back to S2. The formation of a loop, the cycle, and each cycle will also cause the message to be copied to other network ports, eventually forming a network storm. The entire network could be paralyzed.
As can be seen, the switches we discussed earlier cannot be used in such a topology with loops. But what if you want to add a certain amount of redundant connections to the network, then there is bound to be a loop?
The IEEE specification defines spanning Tree Protocol (STP), and if the switches in the network topology support such protocols, they will communicate through the BPUD message (Bridge Protocol data Unit), coordinate each other, temporarily block some of the switch's network ports, so that the network topology does not exist loops, become a tree-like structure. When some switches in the network fail, the temporarily blocked network ports are re-enabled to maintain the connectivity of the entire network.

The algorithm of generating a tree from a graph with a loop is very simple, but the so-called "do not know the truth, only the edge of this mountain", every switch in the network is not aware of the exact network topology, and the network topology may also change dynamically. It is not a simple matter to generate such a tree through the transfer of information between switches (passing BPUD messages). Let's see how the Spanning tree protocol is done.

Determine the root
To build a tree, the first step is to determine the root. The protocol stipulates that only switches that act as root nodes can send bpud messages to coordinate other switches. When a switch starts, it does not know who is the root, then he will treat himself as a root, from its various network port to send BPUD messages.
BPUD messages can be said to indicate the identity of the sender of the message, which contains a "root_id", that is, the sender's ID (the sender thinks he is the root). This ID consists of two parts, priority +mac address. The smaller the ID, the more important the switch is, and the more it should be appointed as the root. The priority in the ID is specified by the network administrator, and of course the better performance of the switch should be specified as the higher the priority (i.e., the smaller the value). Two switch ID comparison, the first comparison is the priority. If the priority level is the same, the MAC address is compared. Just like two people in the same position, had to be ranked by the surname Pen. The switch's MAC address is unique across the world, so the switch ID will not be the same.

At first, the switches were self-righteous and thought they were roots, sending out bpud messages and showing their identities. And each switch will naturally receive BPUD messages from other switches, and if someone else's ID is found to be smaller (higher priority), then the switch realizes "heavens beyond heavens, people outside", so stop their foolish "claiming roots" move. And will receive a higher priority BPUD message forwarding, let others know that there is such a high-priority switch exists.
Eventually, all switches will agree that there is a guy in the network with the ID XXXX, and he is the root.

determining the upstream port
The root of the tree is determined, and the topmost layer of the network topology is determined. Other switches, however, need to determine one of their own, as the network port (upstream) where the message is forwarded upward (the root direction). Consider that if a switch has multiple upstream ports, the network topology will inevitably have loops. So there is only one upstream port for a switch.
So how do you know this is the only uplink? Take each of the nets, to the root of the least expensive one.

It says that the BPUD messages sent by the roots are forwarded by the other switches, and eventually some of the ports on each switch receive this bpud. There are three more fields in Bpud, "cost to root", "Switch ID", "Port ID". When the switch forwards the Bpud, the three fields are updated, the "switch ID" is updated to its own ID, the "Port ID" is updated to the number of the port that forwarded the BPUD, and "to the root cost" is incremented by a certain value (depending on the actual forwarding cost, the switch decides by itself.) May be a approximate value). The root of the tree roots originally issued by Bpud, "to the root of the cost" of 0. Each time the field is forwarded, the corresponding cost value is added.
Assuming that the root of the tree sends a BPUD, the same network port of a switch may receive a copy of this bpud message multiple times due to forwarding. These replicas may go through different forwarding paths before they come to this gateway, so there are different "to root overhead", "Switch ID", "Port ID". The smaller the value of these three fields, the smaller the cost of reaching the root of the bpud, the higher the priority of the BPUD is assumed (in fact, the last two fields only start with the "Stroke by surname"). The switch records the highest priority Bpud received on each of its ports, and only if the bpud currently received by a network is higher than the priority that it recorded bpud (that is, the highest priority bpud ever received). The switch will not forward the BPUD message from another network port. Finally, the priority of the Bpud recorded by each network port is compared, and the highest is used as the upstream port of the switch.

identify the downstream ports that need to be blocked
The switch is a downstream port except its upstream port. The upstream path of the switch does not have a loop because the switch has only a unique upstream port.
Multiple downstream ports of different switches may be interconnected and form loops. (These downstream ports are not necessarily directly connected, but may be connected by multiple downstream ports of multiple switches by a forwarding device on the physical layer.) The last operation of the spanning tree protocol is to select one to forward the message in this group of interconnected downstream ports, and the other network ports are blocked. This eliminates the existing loops. The downstream ports that are not connected to the other downstream ports are not considered, they do not cause loops and are forwarded as usual.
However, since the downlink 22 is connected to produce a circuit, is not the connection of the downstream ports are blocked just fine? As mentioned earlier, there may be situations where a physical layer device connects multiple networks simultaneously (such as hub hub, although it is rarely used),


Assuming that the eth2 port of the switch S2 and the eth1 port of the switch S3 are two downstream ports connected to each other, if the two network ports are arbitrarily blocked, then the host E is disconnected from the network. Therefore, these two network ports must also leave one to provide the message forwarding service.

So, for a group of interconnected downstream ports, who should be chosen as the only gateway to forward messages?
As mentioned above, each switch forwards it when it receives the highest priority bpud. When forwarding, the "overhead to root", "Switch ID", and "Port ID" will be updated. So for a group of interconnected downstream ports, the highest priority is given to the bpud from whom, which means that the cost of reaching the root is minimal. So this network port can continue to forward messages, and other network ports are blocked.
In terms of implementation, each network port needs to record the priority of the bpud it forwards. If it does not receive a higher priority than the BPUD (not connected to the other downstream ports, receives no bpud, or is connected to other downstream ports but receives a lower priority of BPUD), the network port can be forwarded, otherwise the network port is blocked.

through this series of BPUD message exchanges between switches, spanning tree is completed. However, network topologies can also change due to some human factors, such as network tuning, or non-human factors such as switch failures. The Spanning tree protocol also defines a number of mechanisms to detect this change, and then trigger a new round of Bpud message exchange to form a new spanning tree. This process will not be discussed.

Analysis of Linux Bridge

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.