Route: rapid recovery

Last Update:2013-10-14 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

As we all know, to obtain a high network duration, you need to use a master router and a backup router with a hot fault recovery function and different network paths. But this is not enough. Routers themselves, especially those on insecure WAN edges, should be equipped with internal redundant hardware components, such as switching matrices, line cards, and power routing processor RP ). High-availability routers must also support fast software recovery technology.

Vrouters that can isolate control and forwarding panels and use a stable restart mechanism-also known as Cisco NSF)-can greatly extend the normal running time of networks and applications. This software recovery technology can maintain the normal transmission of groups when the RP Process is interrupted, thus controlling the impact of interruption on the network.

Dual RP and continuous group forwarding

The RP contains the "brain" of the router ". It is responsible for storing the database of the optimal route information, storing the adjacent relationship with the peer-to-peer router, and processing specific management functions. Redundant hardware improves the availability of network components in the event of a fault. Cisco dual RP devices include 12000, 10000, and 7600 series high-end routers, as well as Cisco 7500 and 7300 series routers.

The degree of synchronization between the two RP statuses depends on the speed at which the router can restart or recover from a fault. This requires a balance between two extreme backup forms. An extreme form is RP's "cold" backup, that is, it does not contain any status information about layer-2 connections, joining relationships, and the optimal route table. In this case, all this information needs to be re-built, which may lead to a very long recovery time. Another extreme form is the uninterrupted synchronization of all information between two RP, which may occupy too much processing resources and affect the network scalability and performance.

The common approach is to maintain an appropriate balance between the two extreme RP Synchronization Methods, that is, to load most but not all of the recovery information into the backup RP. This synchronization can continue layer-3 packet forwarding when the RP and the centralized route table are switched.

Cisco IOS®Software 12.0 (22) S or later can support Cisco NSF. It can shorten the router downtime when the primary RP is scheduled for maintenance or when the RP encounters an unexpected fault. In most cases, to implement Cisco NSF, the restart router and its peer vro must save the forwarding information for all networks that can be reached by restarting the router. On the router restart, when switching from the primary RP to the standby RP, the control panel and the forwarding panel must be isolated from each other so that the forwarding panel can continue to forward data traffic.

Route Protocol Extension

To implement NSF, Some Cisco routers can use common routesProtocol-- Includes the Border GatewayProtocolBGP), IS-IS, and Open Shortest Path First OSPF. These extensions can continue forwarding groups and maintain network connection stability while determining whether the primary RP can be quickly restored.

To enable most Cisco NSF/stable restart deployment, the peer router that restarts the router must also support these extensions. This is mainly because of the following two reasons. First, RP switching does not represent topology changes, but only indicates RP recovery. A peer-to-peer router can use the smooth restart and expansion to avoid disabling external broadcast and enabling and restarting the router. This prevents unnecessary broadcast information and route changes. Second, it enables the Peer to continue forwarding groups to the restarted vro during recovery to provide NSF. It must also know which messages and information should be exchanged to help the master recover quickly.

Screen. width-333) this. width = screen. width-333 "border = 0>

Figure1:The peers supporting Cisco NSF can help each other quickly recover from emergencies, minimizing downtime, forwarding groups, and maintaining the stability of the entire network.

Screen. width-333) this. width = screen. width-333 "border = 0>

Figure2:No matter whatProtocolUser data groups are continuously transmitted between peers throughout the recovery process.

Basic Steps for stable restart

AllProtocol). The basic steps are listed below:

• Determine that the Peer Router "supports NSF ".

• Send/receive restart bits so that the Peer router knows that recovery is in progress.

• Temporarily Save the adjacent information so that the Group can continue to use the last known route to continue forwarding.

• Exchange New Route database information after recovery is complete

HoweverProtocolDifferent mechanisms are used to establish the peer or peer relationship of the router and exchange route information.AssociationThe stable restart steps and messages are different ).

For example, BGP uses the underlying TCPProtocolAnd OSPF and IS-IS use the Hello message to establish the relationship. Another difference between the two routing protocols is the time for continuous forwarding of groups before it finds that the restarted router cannot be restored and decides to re-use the full network convergence for traditional recovery.

BGPStable restart

Because the impact of BGP restart may be very far-reaching, BGP is an important object for High Availability improvement. BGP can carry a large number of routes. Therefore, after a BGP software fails, network convergence is usually longer than other routes that support fewer routes.Protocol. In addition, Because BGP is a type of Inter-Domain RoutingProtocolTherefore, a failed BGP process may spread to multiple networks, rather than confined to a specific domain.

When the BGP network is restarted,ProtocolThe improvement will begin after the initial BGP connection is established. Restarting the router and its peer vro indicates the support for Cisco NSF by exchanging the BGP Function Code 64 in the initial bgp open message of the creation process.

In general, when the router restarts its BGP process, the TCP connection with the peer router will be cleared, resulting in the Peer router to clear all the routes related to the restart router. However, this operation will not be performed during the stable restart of BGP. On the contrary, the peer router marks these routes as "expired" and continues to use these route forwarding groups based on the expectation that the router will be restarted and the BGP process will be quickly re-established. Similarly, restarting the router will alsoProtocolThe forwarding group will continue when the data is re-converged.

When the router is restarted to create a new BGP process, it will again send the BGP Function Code 64 to its peer router. However, this time, the tag settings in the stable restart function switch will let the peer router know that the BGP process has been restarted.

When the forwarding group continues, the peer router sends an initial route upgrade to the restarted router. The peer router uses an end-of-rib eor flag to indicate that the upgrade has been completed. This tag is actually an empty bgp update message. After the router restarts and receives the EOR from all the peer routers, it will know that it can use the new route information to select the optimal path again.

Similarly, restarting a vro sends all the upgrades to its peer vro, and then uses the EOR tag to indicate that the update is complete. This allows the peer router to replace the expired route with the upgrade received from the restart router.

IS-ISFeatures

Internet Engineering Task Team ietf is in the form of a draft Internet, IS-IS connection status, single-domain RoutingProtocolDesign a similar stable restart process. IS-IS extension designer Mike Shand from Cisco pointed out that in the current Packet®At the time of publication, IETF will publish the last version of the draft. As mentioned above, IS-IS uses HelloProtocolDiscover adjacent routers and establish and maintain the adjacent relationship. When the router is restartedProtocolA restart request RR in the Data Unit sends a signal to its peer router. In an IS-IS network, a peer-to-peer router can send database information directly to the restart router without waiting for confirmation.

After a vro is restarted, it sends a Hello group with a special RR setting to let the peer router know that it has restarted. The peer router sets a special restart confirmation RA bit in its own Hello message to confirm the restart signal. A peer router sends a summary list of all connection status groups (LSP, then, send the LSP specified in the list. In addition, once the summary list is met, the router will be restarted to upgrade its database. From this perspective, this function is similar to the EOR in the BGP stable restart process.

Cisco provides another method for configuring IS-IS on the command line interface CLI to save or save all the adjacent and LSP information to the backup RP. After the switchover is complete, the new primary RP maintains its neighbor relationship using the temporary data and can quickly reconstruct its route table.

The RP switching takes only a few seconds. IS-IS can rebuild its route table in the next few seconds and re-synchronize with the network. At this time, IS-IS will wait for a specified interval, and then try to restart the second Cisco NSF. During this period, the new backup RP will be started and its configuration will be synchronized with the master RP.

After the synchronization IS complete, the IS-IS and LSP data are saved to the backup RP. However, IS-IS will try a new Cisco NSF restart only after the interval ends. In addition, restarting the router will use the first summary list to verify the validity of the LSP cached by the router, so as to maintain the status of the IS-IS protocol.

OSPFMethod of work

When a router that supports ospf nsf performs RP switching, it executes two tasks to re-Synchronize the connection status database LSDB with its OSPF neighbor ). First, it must re-learn available OSPF neighbors on the network without re-configuring the adjacent relationship. Second, it must obtain the network's LSDB content again.

After the RP switch, the router will be restarted to send a Hello group to adjacent devices supporting Cisco NSF at a very short interval, and the length value of the extended option type in the group is TLV) set the restart signal bit. The peer router will realize that it does not need to reset the relationship with the router to restart. After the router is restarted and receives a Hello response as a response to its Hello Message), it starts to synchronize the database with its peer router.

After the Database Synchronization is complete, the router will upgrade its routing information library RIB) and the forwarding information library FIB ). If the network or connection status information is different from the information received by the restart router during Database Synchronization, the restart router sends the information to the peer router.

High Availability networks must be reinforced in many aspects, including Redundant Network Topology Design. In addition, a router with redundant components and software intelligence needs to be deployed. These software intelligence can be used for fault recovery and mitigate the impact of temporary interruptions on the network.

Many Cisco routers use routesProtocolExtended Forms support internal design and software intelligence, so they can help service providers and enterprises reinforce their networks.

ComputingRoutingNormal running time of the Tool

The availability of the system is measured by the average failure interval (MTBF) of the router, that is, the total time of normal operation of the device, and the average failure recovery time (MTTR. MTTR indicates the time when the system cannot process and forward the group. Divide MTBF by the sum of MTBF and MTTR, and multiply by 100 to get the availability percentage of a specific system.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Route: rapid recovery

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Route: rapid recovery

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support