Facebook: an innovative data center network topology
Facebook aerial bird chart in the data center of alitabane artuna
Facebook's data center receives billions of user requests every day. As the company continues to add members and introduces new features, the number of requests continues to increase. This is basically a good thing for Facebook, but it is a challenge for Facebook's network staff. For example, the topology of a data center that can meet the requirements five months ago is now overwhelmed.
As a result, in addition to building a large data center (such as the data center in alentona, Iowa), Facebook engineers are constantly optimizing the network design of the data center. Even so, it may be inappropriate to use adjustments and changes to describe the ideas that engineers have proposed and implemented at the altuna data center, more like they have re-compiled the Network Design Guide.
Old Facebook network
Prior to the construction of the altana data center, Facebook engineers arranged the server racks of the data center into clusters, similar to the architecture shown in Figure. In the actual environment, Facebook does not have only three racks, but has hundreds of racks. In addition, the figure shows the rack-mounted (TOR) switches in each rack. The Rack-mounted switch acts as an intermediary between the server and the upstream aggregation switch.
Figure A: rack-mounted (TOR)-network connection Architecture
This architecture is very useful, but it has given Facebook engineers several difficulties. Alexey Andreyev, a Facebook network engineer, explained: "First, the cluster size is limited by the port density of the cluster switch. To build the largest cluster, we need the largest network device, which can only be sold by a limited number of vendors. In addition, the desire for so many ports in a device is contrary to the desire to provide the highest bandwidth infrastructure. What's more difficult is how to maintain a long-term optimal balance between the cluster size, rack bandwidth, and the bandwidth outside the cluster ."
Fabric: New Network Topology
Engineers think of billions of requests each day as an incentive to abandon the complicated top-down network hierarchy with high bandwidth consumption and change it into a new design, the name is Fabric. The slide in Figure B describes the new server rack cluster named pod. A single pod consists of 48 racks and rack-mounted switches. Rack-mounted switches are connected into four fabric switches. "Each rack-mounted switch currently has four 40 Gbit/s uplinks, providing a total bandwidth of Gbit/s for a 10 Gbit/s server rack ."
Figure B
This design method has the following advantages:
• Easy deployment of pods with 48 nodes
• Scalability is simplified without restrictions
• Each pod is identical and uses the same connection
The next step is to connect all fabric switches-the slide in Figure C describes how the task is completed. Andreyev said that this is relatively simple (it is hard to imagine what it used to be ).
Figure C
Andreyev explained that Facebook engineers adhered to the principle of 48 nodes when adding trunk switches. "In order to implement the connection covering the entire building, we have established four independent 'planies' composed of trunk switches, each of which can be expanded to a maximum of 48 independent devices. Each fabric Switch of each pod is connected to each trunk switch in the local plane ."
Andreyev's subsequent figures are astonishing. "The pod and plane form a modular network topology that can accommodate hundreds of thousands of servers connected with 10 Gbit/s and expand to thousands of trillions of equal bandwidth, provide non-exclusive rack-to-rack performance for our data center building."
Network Operation
From the rack-top switch to the edge of the network, the Fabric network design uses the "3rd layer" technology, supports IPv4 and IPv6, and uses cost-efficient multi-path (ECMP) routing. Andreyev added: "To prevent occasional use of bandwidth by elephant traffic, the performance of the end-to-end path decreases, we allow the network to have multiple speeds-all switches use 40g links and connect to the server through 10g ports on the rack-mounted switch. We also have a server mechanism so that we can bypass the fault in case of any problem ."
Physical Layout
Andreyev wrote that the layout of the New Building shown in Figure D is not very different from that of Facebook's previous design. One difference is that the new master switch and edge switch of Fabric are placed on the first layer between data Hall X and data Hall Y and connected to the external network (minimum entry point, MPOE) spans the age of trunk switches and edge switches.
Figure D
Overcome challenges
Facebook engineers seem to have overcome the challenges they face. Hardware limitations are no longer a problem. This not only reduces the number of different parts, but also reduces complexity. Andreyev said that the Technical Team adhered to the "KISS (simple)" principle. He added at the end of the article: "Our new fabric is not an exception to this method. Although the topology is large and complex, it is actually a highly modular system with many repeated parts. It is easy to automate and deploy, and it is easier to operate than a few customized clusters ."