Detailed introduction to the scalability of the BGP routing protocol for IP Backbone Networks

Last Update:2013-10-12 Source: Internet

Author: User

Tags reflector

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Several issues restricting BGP scalability

BGP is an IP network protocol currently used on the Internet. It provides stable and secure routing protocols for intercommunication between operators and provides a wide range of routing control mechanisms. To better control routing policies, most carriers deploy BGP to backbone routers. As the network expands, the number of routers increases, and the number of route information entries increases, it becomes increasingly important to solve the problem of BGP scalability. At present, the scalability of BGP faces the following problems.

1) The Full-Mesh problem of I-BGP

BGP routing protocol is divided into two parts: I-BGP and E-BGP. I-BGP is used between routers in autonomous domains, and E-BGP is used between routers in autonomous domains. To prevent loop routing, the BGP protocol requires a router to pass through the route learned by the I-BGP, no longer broadcast to other I-BGP neighbors, therefore, in an autonomous region, all routers that participate in the I-BGP protocol must establish a session with other routers, so as to ensure that the routing information can be correctly broadcast to each router. According to this principle, the total number of I-BGP sessions in an autonomous domain is N × (N-1)/2 N for the number of routers running the I-BGP), when N increases, this number will be astonishing: for example, if there are 100 vrouters, the number of sessions is 4950. This is a huge burden on network devices, and will also make the management and configuration of the network very complex. The backbone network is usually composed of a large number of routers running I-BGP. Therefore, whether this problem can be solved directly affects the size of the network.

2) routing oscillation during route Policy Change

BGP is an incremental Update routing protocol. When a new route is to be released, the router will send an Update message to the neighbor. If you want to delete a route, Withdraw will be sent. The Flap of a BGP Route is defined as a Flap when a route is reclaimed by Withdraw and then broadcast for Update. Because the recovery and update of any route will lead to the re-calculation of the entire route table of a router, when there are many Flap conditions, the load on the router equipment will generate a great deal of pressure. Based on my practical experience, when a high-end router computes BGP routes, the CPU load is basically 80% ~ About 90%, or even 100%, occupies almost all CPU resources. Although most of the high-end routers distribute the routing computing module and the forwarding module on different hardware to reduce the performance degradation caused by the busy master CPU, however, frequent changes and updates to the route table may affect the operation of the entire device, and such computation will continue to expand to the autonomous domain as the route is withdrawn or broadcast, this causes the same problems for internal routers.

3) Other issues to be considered

In addition to the above two problems that may cause excessive consumption of router resources, there are other factors, such as the number of routes, the size of BGP Route tables, and the route calculation method, it also affects the performance of the router.

In addition, the larger the network, the more route entries, the more complicated the configuration and management work. Therefore, you need to simplify the configuration during network design to reduce the work intensity of management personnel, avoid manual faults.

2. How to Solve the Problem of restricting BGP scalability

In view of the above problems, we will introduce some related solutions.

1) solution to the bottleneck of I-BGP session count

The problem with too many sessions of the I-BGP mentioned above can be solved in two ways:

1) Confederations (Confederations)

The working principle of the consortium is to divide the network of the original autonomous domain into multiple sub-autonomous domains, and configure the original AS number to each vro through the Confederations id. There are two advantages: one is to retain the original I-BGP attributes, including Local Preference, MED and NEXT_HOP; the other is to be automatically implemented in the Confederations function, you do not need to configure the Administrator to filter the Internal AS number information at the outlet of the network.

2) Route reflectors Route-Reflector)

The use of Route reflectors is currently the most widely used method, which has better scalability than the preceding consortium method. The working principle of Route reflectors is to divide routers in an autonomous region into several clusters, each of which is composed of Reflector and Client. A Full Mesh is formed between the Reflector and the regular I-BGP is run; the Client only runs the I-BGP with the Reflector, for the Client, the Reflector is just a common neighbor, the Reflector plays the role of the routing distribution center, forward the I-BGP Routes learned from other Reflector to the Client, and forward the I-BGP Routes learned from the Client to other Clients in the Cluster and other Reflector outside the Cluster, then broadcast by Reflectors to its Cluster. In the actual network, to improve redundancy, a Client usually establishes a neighbor relationship with multiple Reflector, and is not limited to the Reflector of the Cluster where the Client is located.

It can be seen that the number of I-BGP sessions on the Client is generally 1 ~ Two, compared with the consortium method, as long as the Reflector performance is high enough, the Cluster can do a lot, and the Client load will not change as the Cluster changes. For the previous method, because all the routers in a sub-autonomous domain still need to achieve Full Mesh, the router with the lowest performance determines the size of a sub-autonomous domain; the routing reflector rule is usually that one or more of the highest-performance Routers determine the Cluster size. Therefore, it has better network scalability.

In addition, a Reflector can also become another Reflector Client to form a hierarchical structure, which is particularly suitable for networks built according to the layered structure, it is easy to change the management of flat networks to hierarchical management.

Of course, when using route reflectors, there are also some things to note. For example, Reflector is not a pure forwarding route, and all received routes must undergo optimal calculation on Reflector, then the preferred route will be broadcast to the outside, so the choice of Reflector should be based on the network structure, try to make the I-BGP neighbor relationship and the actual circuit connection relationship relative.

2) control route oscillation

Currently, the control Route Flap mainly adopts the Damping method: a BGP router sets the Penalty value for the received E-BGP Route, each Route Flap will increase the Penalty value of the Route, when the route is stable, the Penalty value decreases with time. When the Penalty value exceeds the preset suppression limit, the route is no longer broadcast. When the Penalty value decreases to below the reuse limit, this route will be broadcast outside again. This approach is mainly for E-BGP neighbors. In this way, when a network has an internal route oscillation, other networks connected to it can effectively defend against the impact on their respective networks through Damping.

In network maintenance, a circuit is often interrupted for a period of time, rather than constantly fluctuating. After a period of interruption, the network traffic will be restored as long as the circuit recovers. On the contrary, after the oscillating circuit recovers, it takes a while to restore normal traffic. Other networks are suppressing the oscillating route ). Damping can effectively ensure the stability of the Internet, but also lead to delayed recovery of some faults.

Currently, all devices support the Damping function and provide configurable parameters to precisely control the Damping. In general, the default configuration provided by the device can meet the requirements of most networks. If there are special requirements, it must be carefully calculated. Otherwise, it is not ineffective, it means that the route cannot be restored for a long time after the route is blocked.

3) Peer Group applications

In the actual network, there is a very common phenomenon, that is, a vro has multiple BGP peers belonging to the same category. Here, "belong to a class" refers to the same or similar BGP policies. When there are many Peer nodes, the BGP configuration will become bloated, and the load on the router will also increase, once there is a route update, the router needs to perform a policy Calculation for each Peer, although the policy is the same ). In addition, the application of Peer Group also has the advantage of reducing the resource consumption on the router device, because the same policy is used when the router updates the route for the same Group, so only one route computing is performed, this greatly reduces the CPU usage time.

In actual network maintenance, it is generally recommended to use the Peer Group mode even if there are few peers in the same class, because it has good scalability. Peer Group applies not only to the Peer of the I-BGP, but also to the Peer of the E-BGP.

4) route refresh measures

During backbone network maintenance, you may often encounter BGP Policy Modification. For example, you need to update the as-path restriction list of a Peer and make the modification take effect after the configuration is modified, the previous practice was to interrupt the current BGP session and re-establish it. The command is as follows:

Clear ip bgp x. x

This is mainly because the router first performs policy operations after receiving the peer BGP Route table, and then stores the optimal route in the local BGP Route table, instead of saving all the original routes. When the policy changes, you must re-establish a connection to obtain all the routes of the other party, and then use the new policy to calculate again. The disadvantages of this method are obvious, such as network blocking and high consumption of vro CPU resources.

Currently, two methods are usually used to solve this problem: one is to save the original BGP Route, so that re-transmission is not required during re-calculation; another method is to request the other party to resend all BGP Route tables without interrupting the BGP session when the new policy is enabled.

The first method is implemented using software configuration. The configuration command is as follows:

Neighbor 1.1.1.1 soft-reconfiguration inbound

When enabling the new policy, enter the following command:

Clear ip bgp 1.1.1.1 soft [in | out]

In this way, all the original BGP routes will be stored in another route table. When you modify the policy, you can obtain the optimal BGP Route table by performing operations on this route table, in this way, the resource consumption caused by the route is small. In addition, the original route table can be used to check the effect of the policy modification after the policy is modified and before the policy is officially enabled. However, this method requires additional memory resources to store route tables.

The second method is determined by the BGP Capabilities of the router device, that is, the built-in functional modules of the system software. When establishing a BGP session, this parameter is exchanged between two routers through bgp open. Whether a device supports BGP Refresh Capabilites. You can run the following command to check whether a device supports BGP Refresh Capabilites:

Show ip bgp n x. x

If the two routers involved in BGP have this function, no configuration is required. When the clear ip bgp n x is executed. x. x. x in, the local BGP process does not interrupt the BGP session, but resends all BGP Route tables to the peer request. Compared with the first method, this method can save memory resources. The disadvantage is that the network administrator cannot understand the original route sent by the other party and resends all route tables, there is no first method for high efficiency. Currently, all devices in the backbone network support this function.

3. Conclusion

The above issues related to BGP scalability and related measures are discussed. In actual application, deployment is also required according to the specific situation. However, the general idea and principle are the same, that is, to reduce the resource consumption of devices, to simplify maintenance and management, and to improve network scalability from both hardware and software.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More