Examples of CLOS architectures network-level device-level Fattree network-level Clos and Clos-related scheduling algorithm RR

Source: Internet
Author: User
Tags switches

1. Overview

Clos from the traditional circuit exchange concept, the concept of the age is too long, in the current data communication network, the connotation has changed. This article mainly discusses is actually endowed with originally slightly has the difference connotation.

Clos architecture itself is relatively broad, with equipment-level Clos, as well as network-level Clos.

Network-level Clos

This concept is actually very similar to the current structure of the Fat Tree Network in Silicon Valley, and can even be considered one thing, mainly through networking to form a very large-scale network structure, the essence is the hope of non-blocking.

Network-level Clos, generally non-strict Clos (about strict and non-strict, will be further described later).

Device-level Clos

Equipment level, also divided into strict Clos and non-strict Clos.

Clos also have different non-blocking, such as nominal non-blocking (not strictly Clos is such) and completely non-blocking (strictly Clos is considered such). In previous exchanges, it was found that the person in the test was receptive to the concept, because in each case you could immediately imagine or construct a theoretical test method to prove it.

2. FAT Tree Network and network level Clos

To say the fat tree concept, you can start with the architecture of the infiniband[Note 1] switch. The InfiniBand switch is prone to 288 ports (port rate 10g/20g/40g), and its chip is a fat tree to build a high-density port switch.

Note The 1:infiniband architecture is a "transform cable" technology that supports multiple concurrent links, where each link can run at 2.5 Gbps. This architecture is at a link speed of four MB/s, the speed is 2 GB/s, 12 links can reach 6 GB/s.

InfiniBand technology is primarily for server-side connectivity issues. As a result, it will be applied to servers and servers (such as replication, distributed work, etc.), servers and storage devices (such as SAN and direct storage attachments), and communications between servers and networks (such as LAN, WANs, and the Internet).

For example, if a Infiniband Chip has a port ( 10G/20g/40g),1 is shown. How do I construct a rack switch for a 288 port (with a slot of three slots per slot )?

Figure 1 24-port InfiniBand chip schematic

As shown in 2, the switch uses 36 identical chips internally, and a high-density, fast port is implemented in a fat tree (Chinese called fat) structure. There was a home to do high-density fast million trillion for the basic products of the chip manufacturers, proposed a similar Ethernet switch scheme (at that time is not very understanding of the model, but thought very good).

Figure 2 InfiniBand switch internal architecture for FAT tree

In many high-performance computing environments, InfiniBand switches also have such a network, and often using the same switch to network, network topology and the principle of Figure 2 (InfiniBand between the different bandwidth conversion, will cause time delay increase, and affect high-performance computing efficiency).

At the beginning of 2010, I discussed network design with a colleague from Cisco. A the concept of Clos network structure is proposed.

Figure 3 "CLOS" Network structure

       3 shows a 5,000 gigabit server cluster network with 4 12500+125 station 5800 can form a 5800 using 40 gigabit servers, 4个万 trillion respectively allied to different 12500. At that time 12500 can not support 4 irf, so need to enable the entire network OSPF routing. Now it doesn't matter, 4 units 12500 IRF, the entire network can achieve a two-layer network without blocking bandwidth. Note that the use of "non-blocking bandwidth", a required clos is any server communication without bandwidth differences, are 1g. However, he also asked the network, is to access the upstream 4 links to round Robin [Note 2] the rotation of the traffic load sharing, and requires the server throughput of the report text section, such as the 1500 bytes, and the server for the error correction of the message sequence. All these elements are added, this particular network is completely non-blocking clos.

Note 2: Rotation scheduling (Round Robin scheduling) algorithm is the rotation of the request to schedule a different server, that is, the execution of each schedule i = (i + 1) mod n and select the first server. The advantage of the algorithm is its simplicity, which does not need to record the state of all current connections, so it is a stateless dispatch.

Because our 12500 was the Clos concept at the time, we were not curious about the network requirements, but basically all access switches were unable to round robin's load sharing.

Test you, if you add a layer of aggregation, using the 24个万 box, how to build a larger bandwidth nonblocking network (10,000-21,000 gigabit ports)? (see [Note 3])

Note 3: Simple answer, add one layer of mega-box, half-port access switch, half port Uplink core switch (one core per upstream port)

At the end of last year HP and we discussed the network structure of fat tree, using a 64-port gigabit device to build a 2000个万 non-blocking network scenario, using a scenario similar to Figure 2, just replacing the stacked chip with a box switch.

Now the network will have what is not enough, why not strictly clos?

CLOS Core Spirit is completely nonblocking or strictly non-blocking, such a network structure, whether it is the "CLOS" structure or fat tree structure, only in the bandwidth does not seem to block, such as the access layer up and down bandwidth equal, but how the data flow distribution to different uplinks?

If the use of round robin mode, because of the difference in the size of the message, will make different links in fact the flow of uneven, but also can be constructed test examples of proof;

If it is a hash algorithm, whether it is a link aggregation or equivalent route, it is not possible to completely evenly, such as a single stream can only hash to a path, if there is a large flow, such as gigabit TCP, the uplink traffic will be uneven.

Therefore, under the current technology, the CLOS structure of the network is not strictly clos.

3, equipment-level CLOS

Equipment-level CLOS is also strictly non -blocking CLOS and non-strict CLOS.

the specific architecture of S12500 ,IP Pilotage related articles have been introduced, here no longer repeat.

S12500 In addition to the Clos architecture, there is a cell-level exchange, that is, the message in the switch is sliced, equal-length exchange, and in the switch is each port through the credit mode [Note 4] communication to achieve internal Lossless, so 12500 is strictly CLOS, completely non-blocking. (Often the main focus is on the Clos architecture, the extra-large cache, and the key details are ignored). Also, this is a very rigorous tolly test.

Note 4: Reference http://blog.sina.com.cn/s/blog_61bd83dc0100yir3.html

Figure 4 The multilevel architecture of 12500

Another question: A netizen asked, about the Clos architecture, the actual application is a main control board plus independent switching routing business board?

From the large scope, Clos is a multi-level, multi-path exchange of a way, in fact, with the main control board, independent Exchange and other direct contact, but in the design of such a Clos system, the main control and exchange separation, more help to improve the reliability of the system, as well as the future scalability of the architecture. Therefore, in this aspect, the actual application is really the main control board + independent switching routing business board.

But the main control and exchange separation is not enough, but also to see whether it can achieve a completely non-blocking exchange, how traffic on multiple switching networks, internal how to achieve traffic scheduling (internal lossless) and VOQ capabilities.

In addition, the use of multi-level crossbar switching network can also be used to build a Clos architecture, but this is not strictly Clos, because crossbar is switched by stream. For example, Cisco explicitly states that N7000 is the crossbar architecture (see the Cisco Live2010 N7000 Architecture Material, 2011 by deleting this page), but its line card capability is strong, the single-broadcast text through the round robin mode of Exchange network, Multicast through the hash of the switch network, you can see the switching network crossbar per port bandwidth of 23G, in general, each slot allocation bandwidth is very high, so the probability of blocking is very small, while using multi-packet combination of superframing technology to improve the efficiency of forwarding, The behavior is close to 12500.

Device-level Clos different manufacturers realize the same way, H3C implementation is one of the ways.

Therefore, we say that a product is not a Clos, the correct statement should be "non-strict Clos" or "non-strict completely non-blocking."

In fact, CLOS, strict CLOS, or non-blocking, completely non-blocking, but also are relatively relative concept, just emphasize the structure of the product to adapt to the ability to differ, do not need to be too differentiated in concept. For a particular application scenario, especially a heavy-duty network environment, it is clear that the Clos architecture has a significantly stronger capacity for the equipment business.

Examples of CLOS architectures network-level device-level Fattree network-level Clos and Clos-related scheduling algorithm RR

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.