How can we improve the network layer reliability of man routers?

Source: Internet
Author: User

The rapid development of broadband services has brought profound changes to the traditional telecom industry and the IT industry. The convergence of multiple businesses and networks has become an irreversible trend. As the main network entity in the man, the broadband man will become the bearer platform for 3G, NGN and other emerging value-added services. Real-time voice and video applications such as 3G and NGN require the man to provide service quality assurance and telecom-level network reliability requirements similar to traditional telecom technology 99.999%. Meanwhile, fierce competition also promotes carriers to provide services with similar SLA with service quality assurance. network reliability is the primary and most important indicator. By improving network reliability, operators can provide Differentiated Services to gain a favorable position in the competition between operators and further establish and consolidate the brand image of enterprises.

The reliability of man routers is reflected in the following two aspects: one is the reliability of the device layer and the other is the reliability of the network layer.

Network reliability is an important part of man router reliability assurance. Due to the slow convergence of traditional router protocols (IGP in seconds and BGP in minutes), it cannot meet the requirements for carrying real-time services. Network reliability is also an active area in the new technology of man routers.

Currently, the new network-layer reliability technologies mainly include fast convergence of IP routes, end-to-end LSP backup, MPLS fast re-routing, stable restart, and RPRIPS.

  Fast Convergence of IP routes

IP dynamic routing is the most basic network-layer reliability assurance mechanism and is inherent in the IP routing network. The IP dynamic routing protocol is used to calculate the IP Forwarding path at the network layer. When a link or node fails and the original data forwarding path is interrupted, the routing protocol dynamically recalculates the data forwarding path, although different routing protocols adopt different mechanisms, their response time varies, but the average level is at the second level. This recovery time is acceptable for traditional IP services, but millisecond-level recovery response time is required for telecom-level IP networks that carry real-time services and other services, there is a big gap between the traditional dynamic IP routing technology and this requirement.

Based on the traditional routing protocol, the improvement can shorten the fault response time of the IP routing protocol. These measures mainly accelerate the convergence of the routing protocol. To accelerate the convergence of routing protocols, you can consider link failure detection, route re-calculation, and route information update. By accelerating the sending frequency of Hello messages between links, accelerating the SPF computing speed, and setting a high priority for route update messages, the routing protocol can quickly detect and handle faults and update routes accurately and quickly, it accelerates the convergence of the routing protocol. By optimizing the IGP routing protocol, the convergence can be less than 1 s.

Another method to accelerate the convergence of routing protocols is to use IGP and EGP to make reasonable hierarchy planning for the network. IGP routes devices in the domain and EGP (BGP4) hosts external routes, the two routes are effectively isolated and not re-allocated to each other. The rational division of IGP and BGP forms a hierarchical routing structure. The convergence of Routing Protocols within and between domains is independent of each other and does not affect each other. This can achieve the fastest speed convergence.

LSP protection switch

Protection Switching is a term used in ITU-T. Protection Switching technology is of critical significance for improving the availability and Stability of MPLS networks. Protection Switching is generally used to pre-compute the protected LSP route and pre-allocate resources. Therefore, you can quickly obtain network resources after the LSP connection fails or is interrupted.

Currently, the development of technology only supports point-to-point LSP Protection Switching. Two protection methods are available: 1 + 1 protection and 1: 1 protection.

1 + 1 Protection uses a dedicated backup LSP as the primary LSP protection. At IngressLSR, the primary LSP and the backup LSP are bridging together, the traffic on the master LSP is copied to the backup LSP and sent to the EgressLSR. The EgressLSR selects the traffic on the master and slave LSP based on the value of the fault indication parameter.

During protection, dedicated backup LSP is also used as the protection for the master LSP, but the master and slave LSP do not transmit the same traffic at the same time, the backup LSP can transmit other traffic while the master LSP is working normally. The traffic Protection Switching rule is implemented in IngressLSR.

MPLS fast re-routing (FRR)

To meet real-time applications such as video conferencing and television services, LSP protection capabilities similar to traditional SDHAPS must be provided for these traffic in milliseconds.

The LSP Protection Switching technology requires the involvement of the signaling protocol. The failure indicator signaling transmission from the fault point to the recovery point introduces unnecessary network recovery latency. MPLS fast rerunning technology enables the fault detection point to redirect the traffic of the faulty link based on the preset protection path without signaling intervention. The recovery point is the fault point. Most of the fast rerunning solutions rely on pre-established backup channels. When a network recovery point detects a network failure, it must update the LSP exchange table simply, switch traffic from the LSP of the faulty port to the LSP established on the normal port in advance.

In addition to improving the speed of protection and recovery, the advantage of fast rerunning can avoid repeated protection in reliable networks and unnecessary consumption of core network resources by configuring protection capabilities in poor network links. MPLS fast re-routing technology provides protection switching within 50 ms and can be used as an alternative to SDHAPS protection mechanism.

  The following configuration process is used for MPLS fast rerunning:

First, at the entrance of LSP, LSR1, use a USER command to activate the MPLS protection switching function. LSR1 sends signals to all lsrs In the LSP path, each LSR calculates a backup lsp for the next lsr bypass, and the LSP fast rerunning configuration is complete. When an LSR In the LSP path detects a downstream fault, the LSR switches the traffic locally to the backup LSP.

IETF has a variety of fast and rerouting solutions. The two mainstream protection methods are Link Protection and node protection. They have different ideas and complexity for solving problems, at present, this technology has not yet formed a formal RFC.

  Gracefulrestart)

Possible causes of control plane restart include software upgrade, software Bug, or hardware failure. When the control plane is restarted without interruption, the data plane is forwarded without interruption. However, if the control plane fails, the peer router will re-calculate the route, bypass the faulty router, and the data plane's uninterrupted forwarding will be meaningless, and the faulty route will spread throughout the network. If this happens on the MPLSVPNPE router, the result is disastrous.

The control plane stable restart technology can effectively solve this problem. When the control plane fails, the router using this technology can notify the neighboring router to continue using the original path for data forwarding, at the same time, restart the router to reconnect with the neighboring router to establish the Routing Status, ensure service availability during the restart process, and minimize the impact of the restart of a single device on the entire network.

During the smooth restart process, the router does not save the relevant protocol status, so the software failure caused by the restart will not continue until after the restart.

Stable restart is a new feature and is not supported by many old devices. Therefore, it can be used on devices that support this feature in a local subnet.

In terms of network boundaries, carrier vbrs face numerous customers and generally do not have redundancy measures. Therefore, it is best to use the stable Restart Technology. The network core usually uses redundant paths for protection, and service restart may easily cause routing loops. Therefore, it is not recommended to use stable Restart Technology on the network core.

[1]

Article entry: csh responsible editor: csh

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.