Facebook Data Center Practice analysis, OCP main work results

Last Update:2015-03-23 Source: Internet

Author: User

Keywords Facebook google data center Facebook data center Linux kernel database server PUE

Tags .mall access added advertising analysis arm available for backup

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Editor's note: Data Center 2013: Hardware refactoring and Software definition report has a big impact. We have been paying close attention to the launch of the Data Center 2014 technical Report. In a communication with the author of the report, Zhang Guangbin, a senior expert in the data center, who is currently in business, he says it will take some time to launch. Fortunately, today's big number nets, Zhangguangbin just issued a good fifth chapter, mainly introduces Facebook's data center practice, the establishment of Open Computing Project (OCP) and its main work results. Special share.

The following is the text:

Secrecy is the practice of the data center industry. In November 2014, I went to the southern Las Vegas (Las Vegas) to visit the SUPERNAP data center alone. After getting off, several times tried to use mobile phone to shoot the building location, quickly by the police in the Hummer patrol stopped. I was still impressed by the readiness of the guards in the guard room to be ready to deal with the robbers, although it was common to have a gun in the United States. Visitors to the data center is not allowed to take pictures is the rules, but I visited the data center were accompanied by someone, have not enjoyed such a heavily guarded treatment.

Note: In the reception room at the Supernap 7 data center, I waited here for more than 20 minutes to observe the guard compartment in the small window. Pictures from Supernap official website, the same below

This is not unrelated to the nature of the managed data center and must be kept confidential for the tenant. Google, the client, sees infrastructure as one of its core competencies, which is also felt by the company's consistent focus on infrastructure. As a result, Google has long been secretive about its data center and custom hardware designs, and employees must sign a confidentiality agreement to leave Google for a year or two without revealing it.

Note: Night supernap 7 data Center, typical American large flat layer structure

But what about the location photos in the data centers that Google has made public?

In March 2009, Facebook, a hardware engineer who dug up Google for nearly 6 years (and earlier as an intern at Cisco), Amir Michael, in charge of hardware design. April 1, 2010, Facebook announced the appointment of Ken Patchett, who is in charge of its first data center in Prineville, Oregon (Oregon) state. Ken Patchett's career started with Compaq (COMPAQ) and has accumulated nearly 6 years of experience in data center and network operations at Microsoft. After Google guided the data center in Oregon State Dalles, before going to Facebook, he worked in Asia for more than a year, managing Google's own and managed data centers. Turn round and return to Oregon State.

Note: Security room in SUPERNAP data center

From server design to data center operations, Facebook insists on digging Google corners, which is bad for lawsuits-which means exposing more details. Even more so: in April 2011, on the Dongfeng of the Prineville Data center, Facebook announced an open computing project (open Compute PROJECT,OCP), which opened up a series of hardware designs, including data centers and custom servers.

Three years two big strokes, first dig people, then open, Facebook is often compared to the top three (and also Microsoft and Amazon) on a scale of data center size compared to Google, and OCP, even the "million-rich buzz" (the noise of server fans, not derogatory) comes in.

Facebook has raised the PR battle of the data center to a new level. Google disclosed some data center technologies in October 2012, including a visit to the press, and nearly hundreds of high-definition photos on the site. However, for it devices (servers and networks) and related technologies, Google remains secretive, up to and above its discarded servers. URS participation in the front and back two editions of the book, is also based on the macro concept and data center level of the construction principle.

Note: Google is located in the Oregon State Dalles Data Center (Columbia R.), and team members can enjoy rafting, wind surfing, fishing and hiking. Watch the top left corner of the mountainside (source: Google official website)

Interestingly, James Hamilton also comments on Google's public information analysis. AWS, once thought to have the power and secrecy of a technology center with Google, is now the most secretive.

Overall, Google revealed a long history and the recent situation, the middle of the expansion process is not many, Facebook's history may be used for reference.

from one server to multiple data centers

Mark Zuckerberg put Facebook online in the Harvard dormitory in February 2004 with only one server. Just five years later, the world's largest social networking site has more than 300 million active users, processing 3.9 trillion feeds a day, more than 1 billion chat messages, 100 million search requests, more than 200 billion PV per month ...

In a small number of people, a few photos, no video of the initial period, all services running on a server is not a problem. The Facebook site of 2009 was obviously another way: loading a user's home page, a seemingly simple gesture, required access to hundreds of servers in less than a second, processing tens of thousands of scattered data and submitting the required information.

The growth of the server is not difficult to imagine, there are indications that the number of Facebook servers:

April 2008, about 10,000 units, 2009, about 30,000 units, June 2010 at least 60,000 ...

Even today, this number can be ranked in the forefront of Tier 2 Internet customers (100,000 units, over that is tier 1,facebook is one of more than 10), energy efficiency is a must be considered. With a conservative calculation of 200W per server, the annual power consumption has exceeded 100 million degrees. If the data center Pue (Power Usage effectiveness) can be reduced from 1.5 to 1.1, it will save 42 million degrees of electricity each year.

Until 2009, Facebook still relied on rented data center space and no self-built data center. Lease the data center space (deploy the server yourself, network and other IT facilities is the advantage of faster delivery, can be done within 5 months; it takes about a year and more upfront investment to build a data center, but it can be customized according to its own needs in terms of power supply and cooling, which is more cost-effective for super large users, Google, Microsoft and Amazon have already built their own data centers.

Note: Prineville's two data center architectures (Source: Facebook website, 2014)

In January 2010, Facebook announced the construction of its own first data center in Oregon State Prineville, with a planned area of about 14,000 square meters, with a target of pue of 1.15. In July of that year, the social giant decided to multiply the size of the Prineville data center to about 30,000 square meters. Completed in December 2010, thanks to a series of energy-efficient designs with external air cooling and no air-conditioning, the pue can be as low as 1.073. Compared with the "industry average" of 1.51, energy conservation is slightly better than the assumptions we just made.

Note: At the end of August 2013, the Altoona Data Center construction site at Sunset covers an area of about 194 acres. By mid-November 2013, there are more than 200 people working every day for nearly 100,000 hours (Source: Facebook website)

Facebook, which has tasted sweetness from its own data center, has been building on the Dara city of North Carolina State (North Carolina) (announced in November 2010) Luleå of Sweden (announced in October 2011) and Altoona of Iowa (Iowa) (announced in April 2013) to build data centers. Each data center was built with extensions, like Prineville and Dara City, which added a data center for cold Storage (construction), and the two-phase project of Luleå and Altoona was launched in 2014.

OCP Origin: Inferior to win the blue?

Without open source there is no internet industry today, but this is mainly from the perspective of software. Google has done a lot of work on software open source, and the famous Hadoop can be seen as the result of Google's unintended "open source" mentality. In February 2015, Google announced the opening of its June 2014 acquisition of MapReduce for C (mr4c), a mapreduce framework developed in C + + that allows users to run native C and C + + code in their Hadoop environments. is the gospel of the Hadoop community.

Supporting the Internet infrastructure is open hardware technology, which is not the same as open source. Intel has defeated IBM and other RISC vendors (ARM) by opening up hardware technology, but at least before OCP, it was unthinkable that Dell and HP would disclose detailed design materials for their servers. Also, "Open source + open" does not mean that the results must be transparent, and Google has created proprietary data centers based on open source software and open hardware technology.

It should be said that Zuckerberg realized early on that Facebook and Google had a World war, and that day was much quicker than the Chinese listening to a familiar pattern. Google is advertising on the web, and Facebook is advertising on its social network, just as Tencent doesn't let Baidu search into micro-letters, and Facebook has to develop its own search engine. The 2013 Facebook Online graph search,2014 was updated to Facebook search in early December and then removed Web search results from Microsoft Bing in Facebook's search.

One important difference is that Tencent is no smaller than Baidu, and Facebook itself cannot compete with Google. From the server to the data center, Google started early, large-scale, self-contained system. To quickly narrow the gap between infrastructure and Google, Facebook came up with a coup to expand the ecosystem through open source, an open computing project (OCP).

Note: The logo for Open Computing Project (open Compute Project), on the left is the "F" (Source: Zhang Guangbin, 2013) that is spelled with the server board.

As an Open-source hardware project, OCP not only publishes details of Facebook's "self-made" data center and server, until the CAD drawings for racks and motherboards, and invites open source communities and other partners to use and improve. It is divided into two steps: first release specifications and mechanical drawings, and then work with the community to improve them.

If we consider the components of a similar hardware vendor on Facebook and Google, it can be seen that even the core of the ecosystem, such as Intel, has little to do with such a communal mindset. Yes, the last one to do this is Google, in order to fight Apple iOS and open source Android, successfully built a huge ecosystem, to the Wolves siege Tigers.

In this capital and talent-intensive industries, open source is a good way to compete for talent, but also has a significant advertising effect. More customers use hardware based on OCP specifications, and they can also increase purchases, helping Facebook reduce costs and play the same kind of deals.

At that time OpenStack was just emerging, OCP also adopted a number of similar practices, such as a summit in the second half of the year (Summit), and on October 27, 2011, the second OCP Summit, announced the establishment of the OCP Foundation (Open Compute Project Foundation). However, the cycle of hardware design was longer, so that it was changed from 2012 to one year, and the sixth Summit was held from 9 to 11th in March 2015.

Note: Facebook's infrastructure sector (source: Zhang Guangbin, 2013)

At the fifth session of the OCP Summit, held at the end of January 2014, Mark Zuckerberg and Facebook vice President Jay Parikh announced that the open source hardware program helped Facebook save 1.2 billion of billions of dollars in the three years since OCP was founded.

At this point, the total number of members of the OCP is close to 200 (including 2014 Microsoft, VMware and other heavyweight traditional manufacturers), to Quanta (Quanta) as the representative of the 7 solution providers, a large number of validated design, Facebook and rackspace adoption ... Next, from the Board of directors and the typical project two aspects, General introduction OCP This open source hardware organization structure and main results.

Board of Directors: The Legacy of Experience

Setting up foundations, not under the control of the Facebook family, is self-evident to the importance of OCP development. The OCP Foundation operates under the management of the Board of Directors, initially with 5 directors, from 5 companies respectively.

Frank Frankovsky, on behalf of Facebook, serves as chairman and president of the OCP Foundation. Joined Facebook in October 2009 as director and vice president of hardware design and supply chain operations. Previously, the director of Data Center Solutions (Datacenter Solutions,dcs), which was responsible for the server customization business, worked for nearly four years as the product manager of Compaq (COMPAQ) computer company in the 90 's.

Note: A corner of the Facebook hardware lab. In the hardware lab, this is pretty neat. (Source: Zhang Guangbin, 2013)

Mark Roenigk, Rackspace Hosting's COO, worked for 9 years at Microsoft and spent most of his time in the OEM and supply chain operations, after 7 years as a Compaq engineer. Rackspace is a well-known server custodian with a wealth of data center construction, operations, and hardware experience, and with NASA, openstack--is the only company that has begun in a soft, hard, two open source organization.

Jason Waxman is currently the general manager of the Intel Data Center's high density computing business, which includes the Internet Data Center, Blade server, and technologies related to future-intensive data center architectures. He is also responsible for leading Intel's work on cloud computing and serving as a management position in the Board of Directors of the blade.org and server Systems Architecture Organization (server System Infrastructure Forum,ssi Forum). Previously served as director for Intel Xeon (Xeon) processors, related chipset and platform products and their customer relationships.

Note: Facebook in the Silicon Valley Park was formerly a great company to be remembered for sun--'s phone (source: Zhang Guangbin, 2013).

Andy Bechtolshiem from Arista NX, the more resounding name is "Sun Microsystems co-founder". Andy Bechtolshiem as Sun's chief system architect, the first to invest in Google, and also to be chairman of Flash start-up DSSD, the latter being a high-profile acquisition by EMC in May 2014.

In addition to Goldman Sachs's Don duet career is the CIO, the above four have a strong background in the hardware industry, from products, technology to the supply chain are covered, well-informed, experienced, to control the development of open source hardware project is critical.

As mentioned earlier, OCP has a large number of projects, from servers to data centers, as well as racks (Rack), storage, networking, and hardware management, and started HPC (high configured Computing, High-performance computing) projects in 2014.

server: Started in Google, the end of a faction

It's not too early for Facebook to start customizing hardware, and early-stage servers come from OEMs. Facebook's head of infrastructure engineering, Jay Parikh, said at the Gigaom businessesflat-out European Conference in mid-October 2012 that the data center in Sweden Luleå would be Facebook's first complete failure to use OEM server hardware.

Note: Facebook's data center cluster (2014 public data), the front-end (FE) cluster includes a large number of Web servers and some ad servers, a relatively small number of multifeed servers; the service cluster (SVC) includes servers such as search, pictures, messages, etc. The back-end (BE) cluster is primarily a database server. The size of this configuration may vary with the subsequent application of the "6-pack" core switch.

This is clearly directly related to Amir Michael, who was first mentioned in this chapter, who joined Facebook six months earlier than Frank Frankovsky and was one of OCP's co-founders, and served as OCP incubation Committee since January 2013 (incubation Committee,ic, vice chairman of the Coolan nerdy in April, has roots in Facebook and OCP, and Amir Michael is co-founder.

Figure note: Infrastructure redundancy between regional data centers. FE (front-end cluster), SVC (service cluster), be (back-end cluster) to form a whole, with the data center of another region redundant (source: Facebook)

Transcendence often begins with learning and imitation, although Newton's so-called "standing on the shoulders of giants" does not mean that. When OCP was founded, the first generation of OCP servers contributed by the Facebook data Center team was largely based on Google's design, with the most obvious logo being a 1.5U (66mm) server chassis. The advantage of this is that you can use a larger diameter 60mm low speed fan, which is more energy efficient than the 1U server 40mm fan. The 450W power supply Module (PSU) supports 277V AC and 48V dc input, which reduces unnecessary voltage conversions compared to 208V, which provide short term power supply to backup batteries, to avoid energy loss as much as possible. Heat dissipation and power supply are two-pronged, control electricity (province opex).

Figure Note: Prinevill Data center power conversion link and loss status comparison (source: Facebook)

Another point is to remove the (front) panel and BMC, without the VGA interface to implement Facebook's "Vanity-free" (no waste) spirit. The goal is to minimise the acquisition costs (capex), although the workmanship looks a bit rough. As Jay Parikh says, OCP servers have a lot less functionality than standard servers and need fewer parts as much as possible.

Figure Note: 48 volt battery Cabinet Transmission path (Source: Facebook)

OCP V1 Server has AMD (12 Core Opteron 6100) and Intel (6 core Xeon 5600) two dual-way solutions, the motherboard size is 13x13 inches, made by Guang da (Quanta). Chassis width (480mm, slightly less than 19 inches) and height units (Rack U, that is, ru,1ru 1.75 inches, that is, 44.45mm) comply with industry standard "old rules", the back end has 3 hard drive bays, and the motherboard is a tool-free disassembly.

Note: The OCP server V1 (left) and V2 (right) use the same 1.5U chassis, 4 60mm fans are located behind the motherboard, and the right hard drive carrier provides cooling airflow from the power supply module. V2 improvements include: easy to maintain hard drive predecessors, 2 boards to increase the density of computing, but at the expense of the number of possible hard drives; CPU Performance Promotion (Source: Facebook)

Prior to the third OCP Summit held in San Antonio in early May 2012, AMD and Intel contributed to the design of the second generation of OCP motherboards, benefiting from the overwhelming dominance of the Xeon (Xeon) E5-2600,intel. The Intel OCP v2.0 motherboard, code-named "Windmill", uses the dual Intel Xeon e5-2600, which is long and narrow (6.5x20 inches, about 165x508mm). The OCP V2 server is still 1.5U, but the motherboard is only half as wide as the first generation, thus accommodating two compute nodes and doubling density in the same chassis.

To support two motherboards, the power supply module of the V2 server is upgraded to 700W and swapped with the hard drive, so that the hard drive can be maintained directly from the front.

After two generations of server groping, have exposed some problems:

The redundancy of the power supply module is poor. Compared to the 1+1 redundant power supply for industrial standard servers, the two generation servers have only one power supply module. OCP V1 server can be used to explain the "animal model" (key components of the problem is to replace the entire server), OCP V2 Server power supply module failure will result in two computing node failure, a bit of the correction "in vain". To this end, Facebook also designed a highly available (high Availability,ha) server program, that is, to add a PSU, replace a motherboard, equivalent to the density of the calculation back. You can use the PSU in the previous chapter to focus on the rack-level solution (this is the case in China's Scorpio machine), but with the width of the 19-inch chassis, take away the rest of the PSU, not enough to put down the third motherboard (6.5x3=19.5 inch). Computation is decoupled from storage. This is particularly evident in the OCP V1 server, where 3 drive bays can be placed with 6 hard drives, and computing nodes use only one boot disk, which creates a lot of space waste to retain the flexibility that is not enough; OCP V2 is fine, because the additional motherboard is crowding out 2 drive bays. The 60mm fan is not big enough. Varying degrees of retention of the USB interface, but no BMC (baseboard Management Controller, Base Board management Controller). Which is more valuable to management, it goes without saying.

In addition to the last point, other points require a change in the chassis and even in the rack design.

Open Rack: Redefining a data center rack

Facebook was originally designed with a 19-inch triple cabinet, named Freedom Triplet, with a width of 1713mm, slightly narrower than the three side-by EIA 310-d racks (600MMX3). On the outer side of the two racks (cabinets, Rack) each has a top-type rack,tor switch, each column 30 open compute servers, a total of 90. A group of triple cabinets full of 90 servers after the total weight of 2600 lbs (about 1179 kg), the two groups of triple cabinet share a backup battery cabinet.

Figure Note: With the first two generations of the server freedom Triple cabinet, because of parallel and slightly provincial material and more stable, height is also slightly above the common 19-inch rack, can accommodate 30 1.5U servers (45U) and switch (Source: OCP specification)

Facebook soon realized that the EIA 310-d standard, formed in the 1950 's, did not meet their requirements. EIA 310-d standardizes the width (19 inches) between the rack's internal rails, but leaves the manufacturer with the specification of height, depth, installation and cabling schemes, and connectors. Facebook believes this has led to unnecessary fragmentation of server and rack design, targeting customers to specific suppliers and their implementations.

Figure Note: A DC UPS battery cabinet supports two sets of triple cabinet total system of 180 servers (source: facebook,2010 years)

A more critical problem is that the traditional 19-inch rack, with its side and rails, is only 17.5 inches available for the IT device (server, storage), and cannot be placed on 3 (6.5-inch-wide) motherboards or 5-inch hard drives side-by-sides. The narrow is already there, such as IBM mainframe and EMC's high-end storage, has a width of more than 60cm rack, such as the EMC Symmetrix VMAX, The system and storage racks are wider than 75cm (30.2 inches, 76.7cm), and are designed to accommodate larger servers (storage controllers) or more hard drives.

However, a widening of the outside may not improve efficiency, two large and high-end storage from the volume, or niche products, very few people buy thousands of racks. Facebook's approach is to keep the outer width 600mm (nearly 24 inches) unchanged, extend the internal transverse spacing from 483mm to 538mm (21 inches), increase 55mm (about 2.2 inches), remove the expensive rails, and jump to 17.5 in space utilization from 73% (87.5% inches). is a pioneering undertaking.

Note: Open Rack top View (after the next), you can see the expansion of the width, front-end maintenance & back-end power supply and other factors (source: OCP specification)

Since the important Necoin has changed, simply put each unit also redefined, the height from the traditional Rack U (RU) 44.5mm, slightly magnified to 48mm, named Openu, OU, rack also named Open Rack (Open rack). To be compatible with previous devices, keep the 0.5 ou as the smallest unit, but it does not seem to have introduced products for non-integer OUs later.

Then is the integrated power supply module, divided into 3 power Zone, each power supply area has 3 ou power Box 7 700W psu (from OCP V2 server), n configuration, a total of 4.2kW, the entire rack of power supply capacity of 12.6kW. Two pdu,200-277v per rack AC in left rear, 48V DC in right rear. Server from the rear of the rack is equidistant distribution of 3 copper row (bus bar, Busbar) on the electricity, PSU output voltage of 12.5V, just meet the server to 12V input requirements.

Open Rack v0.5 Edition specification released on December 15, 2011, at the third OCP summit on the grand introduction. This release recommends that each power supply area be used for it devices for the ou,12 OU, and then leave 2 OUs for the Tor switch, with a total height of at least one OU (not less than 2300mm, which appears to be the remains of the triplet longitudinal space allocation idea). September 18, 2012, the Open Rack 1.0 specification published, the main clear the following points:

Focus on single row rack design (not triple cabinet);

Inlet (inlet) temperature increased to 35 degrees Celsius, reflecting the other open compute design and real data center temperature;

Switch layout more flexible, not limited to the top of the power supply area;

The computing device (server/storage) chassis is 1-10 Openu high and supports the L-shaped bracket for direct loading. The L-shaped bracket obviously saves space and cost compared with the traditional server, and can be fixed by 0.5 openu (24mm) without tool installation.

The maximum height depends on the power supply area, but it is not recommended to exceed 2100mm to maintain stability. It is common practice to have an OU in each power supply area of ou,it, plus 2 ou switch, altogether;

the newly designed reed (clip) to make the chassis power connector easy to match with the copper row.

Note: Open Rack V1 front and side view (left front right) to see distribution of vertical space (source: OCP specification)

In combination, the features of Open rack are:

Expand space. Groundbreaking increase in internal utilization, especially for it equipment, the width of the greatly increased, the unit height is slightly improved, while maintaining the compatibility with the original rack standard (uniform, height close), centralized power supply. Provide rack-wide sharing and redundancy, servers and other IT equipment directly plugged in the power to remove the shelves when the manual wiring work; The back end is used for power supply and heat dissipation, and the maintenance staff can finish daily work on the cold channel side without entering the hot channel. Running on both sides not only increases the workload, but also makes it difficult to identify the equipment at the back end, which leads to misoperation easily.

Of course, side effects are also some of the support on both sides of the thin, while the internal IT equipment may also increase the weight (Open Rack V1.1 specification has reached 950 kilograms, near the beginning of this section of the triple Cabinet), the strength of the rack to challenge. This is especially true in the delivery of the entire cabinet, where the early open rack are supported by oblique beams at the back end to prevent deformation.

However, in the current open Rack V2 specification, the basic rack configuration supports 500 kilograms of it devices in a dynamic environment, and the overloaded rack configuration (Heavy Rack Config) can support 1400 kilograms of it devices by adding fastening bolts--as a contrast, James Hamilton revealed at the Re:invent 2014 conference that AWS's storage-optimized racks could hold 864 (3.5-inch) hard drives, weighing 2350 pounds (about 1066 kilograms)-and that it was a matter of learning how to put up this density.

or similar to the way the Triple Ark stable (Source: OCP UB Workshop)

Note: Open Rack V2 also has important improvements such as restructuring the power supply layout and removing individual battery cabinets, which will be described later in this chapter.

Open Vault: Storage Detach from Server

Thanks to open Rack, the third generation OCP server (code-named Winterfell), unveiled at the fourth annual OCP summit, has a qualitative leap in design:

The motherboard is still v2.0, but the server is up to 2 OUs, with a special emphasis on not 1.5 ou,80mm fan efficiency further, larger vertical space is good for accommodating full-size GPGPU, supports two full height PCIe cards, and a 3.5-inch drive slot that is maintained from the front; There is no PSU in the server chassis, just put three (2 80mm fans), respectively, from the back of the copper row, the density of further increase (2 OU3) and mutual independence; the perception, the work fine many, the bare part of the treatment is also better, generally not to lose the standard commercial server.

Figure: OCP Server (Winterfell) for open Rack V1 top view and triple pack (2 ou rack space) (Source: Network picture combination)

Now the OCP server board has grown to V3.1, the size is unchanged, supports the Intel Xeon e5-2600 v3,16, plus the BMC, which supports open Dimm/nvdimm Rack and V1. 3 75W PCIe x8 slots that squeeze the location of the hard drive and replace it with the onboard msata/m.2 (2260,60mm length), which previously supported only mSATA and needed to pass the adapter.

The hard drive was first marginalized, then the operation of the operating system was also taken away by SSD. So what about mass storage?

Note: There are 6 types of servers that are not stored in the project, and Typeⅱ are merged with the Typeⅵ configuration (weak amd AH), most of which are not available; the storage configuration of Typeⅳ and Ⅴ looks much like 2U's so-called "Storage Server" (Source: Facebook)

We often say that internet companies do not buy storage (devices), which refers to traditional enterprise arrays such as Sans, Nas, and not without the need for bulk storage. Like the AWS storage optimization rack Just mentioned earlier, this is an example.

OCP V1 Server supports up to 6 3.5-inch hard drives, full, not much, only one or two, the rest of the space is not useful. To remain flexible, you have to pay the price of wasting space, and the problem is that there is not much flexibility.

The Amir announces a project design for storage-intensive applications that looks like a 4U device, supports 50 hard drives, allocates two controllers, can connect to multiple servers, and provides variable computing and storage ratios.

The third OCP summit, the lost AMD based on its dual-slot Opteron 6200 motherboard to establish a code-named Roadrunner project, including 1U (HPC options), 1.5U (General), 2U (Cloud Options), 3U (storage calculation options) A total of four specifications. The 2U supports 8 3.5-inch or 25 2.5-inch drives, 3U supports 12-inch or 3.5-inch drives, and is not as good as the OEM manufacturer's servers for a 35-inch hard drive density. In the open rack practical, this project is increasingly not below, AMD also take refuge in the arm camp, in the OCP project mainly to the micro-server card (micro-server cards) Brush the existence of a sense.

In general, it is Amir that computing and storage separation (decoupling, disaggregation) ideas. Facebook, with the efforts of the hardware engineering manager per Brashers and Chinese engineer Yanyong, has been successful in opening the Open Vault (code-named Knox) at the same summit. This is a jbod of the width and height (2 OUs) that match the open rack (ethically a Bunch of workloads, a simple set of hard drives, no processing capacity, to be used with compute nodes, a total of 30 3.5-inch hard drives, divided into two layers, each layer has 15 hard drives and a pair of redundant " Controller. The logic of the circuit is much simpler than the server board, which is basically Facebook's own design, first made to Quanta, and after being contributed to OCP, there are versions of other providers (such as Hyve FX and Wiwynn) that are produced by the OCP server.

Figure Note: Open Vault with 15 hard drives out of one layer (tray), 2 ou devices above the background rack power supply area for JBR, also JBOD (source: Zhang Guangbin, 2013)

Open Vault is a very classic design, followed by a special chapter to analyze.

Note: In addition to the natural update of CPU, memory, and hard disk configuration, 2013 Facebook's Hadoop (Type 4) and haystack (type 5) servers are all open Vault, and the cold storage racks become a new type of server (7), It is also understood from the hardware architecture as a low performance storage system consisting of 8 JBOD for a single controller (Source: tab based on Facebook data)

Now, Facebook servers that require mass storage, such as Typeⅳ (for Hadoop) and Typeⅴ (picture Applications for Haystack,facebook), are stored by the Open vault and add a OCP server with 8 open Vault (240 hard disk) cold storage (Storage) type-a total of 18U, occupies half a rack.

Data Center: RDDC and water ...

As described earlier, OCP's gestation is inextricably linked to data center construction, and Facebook's contribution to the data center's electrical and mechanical design code based on Prineville data center practices is one of OCP's earliest documents , Facebook's contribution to the OCP design specification for cold storage hardware includes recommendations for the ground layout of a cold storage data center, which is the configuration described above.

Note: Is the image of the Facebook Luleå Data Center on the edge of the Arctic Circle a bit like the Google Finnish Hamina data center introduced in the previous chapter? Maevaara wind farms, which provide electricity to the Hamina data center, are not far north of Luleå ... (Photo source: Facebook)

In early March 2014, Marco Magarelli, a design engineer for the Facebook data Center design team, wrote on the OCP website that the second data Center Building (luleå2) in the Luleå Park in Sweden would be "rapidly deployable data Center" (Rapid Deployment Data CENTER,RDDC) concept of modular construction. RDDC consists of two methods, the second "Flat Pack" (assembly) that claims to emulate Ikea, but the real "local" is to adapt to the cold weather in Sweden (Luleå less than 100 km from the Arctic)-- Veerendra Mulay, a mechanical and thermal engineer at Facebook, said in an exchange with me that the traditional way to build a data center would take 11-12 months (see Prineville), RDDC could be shortened to 3-8 months, So as to avoid Luleå snow season (Tencent Tianjin data Center construction process was also blocked by Blizzard).

Figure Note: Different types of chassis mode (source: Facebook)

The first type of "chassis" (chassis) comes from a 12-foot-wide, 40-foot-long, preformed steel frame that is similar to the idea of assembling a car's chassis: building a framework and then assembling the parts on an assembly line. Cable slots, transmission lines, control panels and even lighting are pre-installed in the factory. Correspondingly, this modular approach is like building Lego blocks.

Figure Note: Flat pack mode of segmented assembly (source: Facebook)

As the name suggests, the essence of both methods embodies the transformation from traditional engineering projects to factory prefabricated products and on-site modular assembly. By deploying pre-installed assembly and prefabricated unit modules, delivering predictable and reusable products, RDDC can achieve site-independent design, reduce site impact, improve execution and process objectives, speed data center construction, improve utilization, and replicate easily to other areas. Improve efficiency, after all, to serve business needs.

Figure Note: Prineville The first Data center of the thermal design, the upper ceiling (in contrast to the previous Altoona data center frame structure photo) on the external cold air and reflux heat, the treatment of a certain proportion of mixed

The RDDC largely benefited from Facebook's push for fresh air cooling (fresh air cooling), the lack of air-conditioning (chiller-less) and cooling water pipelines, and the ease with which the data center could be modularized, while the other benefit was low pue (about 1.07). By contrast, Google's data center has a high degree of modularity, but the cooling water pipeline is a hindrance, and pue a slight disadvantage (about 1.12). However, Facebook's data center is slightly less secure because of the water mist that regulates temperature and humidity.

Figure Note: Google Oregon State Dalles Data Center interior, Blue is the cold water supply pipeline, red to the warm waters back to refrigeration. Laying water pipes typical projects, time-consuming and laborious, difficult to modularize (source: Google official website)

In the summer of 2011, Prineville's data center was put into use, and the building control system mistakenly transported the cold air (80 degrees Fahrenheit), which is rich in water (humidity 95%), "The room is like a rain cloud", many servers wet reboot, or due to short circuit and automatic shutdown. In late June, Facebook planned to Prineville data center Two, like the North Carolina Dara data Center, to raise the server's inlet temperature from 80 degrees Fahrenheit (26.7 degrees Celsius) to 85 degrees Fahrenheit (about 29 degrees), and to increase the relative humidity from 65% to 90%, Temperature Rise (ΔT) from 25 degrees Fahrenheit to 35 degrees Fahrenheit, designed to reduce the environmental impact and allow for a 45% reduction in air handling hardware. It now appears that the subsequent two indicators only to 80% and 22 degrees Fahrenheit, and only Dara City data center relative humidity of 90%, I do not know whether this accident has a direct correlation.

Note: Basic design metrics for Facebook's three main data centers (Prineville, Dara City, Luleå) (Source: Facebook)

Network: From the edge to the core

Intel Mezzanine card Design in the reference platform of the Xeon e5-2600, especially the NIC, allows high-density machine access and standard (PCIe) card proximity to the flexibility. This idea is well represented on the OCP Intel V2.0 motherboard, also based on the Xeon E5-2600, which is designed in accordance with the OCP Mezzanine Card 1.0 specification and is installed at the front end of the motherboard (cold channel side) for easy maintenance.

In the case of standard rack servers, the urgency of using mezzanine card design is not high, but also increase the cost, so the response of OEM manufacturers is not very enthusiastic. Supporters such as Dell will be flexible as the main selling point to Broadcom or Intel's network card modules, in the hope of promoting traditional enterprise users to accelerate the upgrade to the million gigabit network. OCP server is a large number of Mellanox gigabit sandwich cards, rich features such as reducing transmission delay ROCE (RDMA over Ethernet, Ethernet remote Memory direct access) and hardware virtualization Technology Iov (single Root I/O Virtualization, single root virtualization) is also a selling point. Even domestic OEM server manufacturers such as Lenovo, also in its Scorpio 2.0 server node in the use of this mezzanine network card, so "doctrine" spirit to expand the coverage of OCP has a certain positive role.

Figure Note: Lenovo Scorpio 2.0 Machine Cabinet server node is the million Gigabit OCP mezzanine card Cx341a,mellanox ConnectX-3 en family of single Port 10GbE network card, the original production of Israel (Source: Zhang Guangbin)

OCP Intel V3.0 Motherboard has added support for OCP Mezzanine Card 2.0. The optional second connector is added to the 2.0 sandwich card. To meet the needs of future high-speed networks (such as 100GbE), the more significant change is the expansion of on-board space and the support of interface modules from 1.0 to 2 SFP to 2 QSFP, 4 SFP or 4 rj45/ 10gbase-t options.

Figure Note: OCP Sandwich card V2 has three major improvements: Add connector B, enlarge board space, optional I/O region (Source: OCP UB Workshop)

It is important to point out here that the mezzanine card belongs to the server project. OCP on the network project started relatively late, from 2013 only began to have standardized production, 2014 gradually grow.

According to OCP, the initial goal of the Web project was to develop the edge (leaf, tor) switch, followed by the backbone (spine, equivalent to aggregation) switches and other hardware and software programs.

Figure Note: There is a corresponding relationship between the aggregation (convergence) accessed (access, Tor) and the Spine (branch)/leaf (the leaves) of the three-layer network (Source: Cumulus NX)

Network device and server homology is not as high as the storage device, to switch to the server ratio, density is not a level, space expansion is not a priority. Some of the existing OCP custom switches are very conventional in appearance size, Standard RU, can be installed in the 19-inch rack, power supply and fan layout is also very traditional, help to be accepted by the enterprise market. At present, the OCP network hardware pursues the similar server experience and even the life cycle, including the control plane and the data plane high modularity, the software and the hardware solution coupling, realizes the customization flexibility (DIY), avoids being locked by the supplier.

Figure Note: OCP Network project phased goal, first from the traditional monolithic (Monolithic) switch to the hardware and software decoupling, and then further modular (source: Facebook)

The core of the data plane is the ASIC (such as Broadcom) or FPGA, which supports the 40GbE scheme, the CPU of the control plane can be x86 (such as AMD Embedded SOC, or Intel Atom), PowerPC (such as Freescale multicore PPC), MIPS (such as Broadcom MIPS) or arm. By the end of February 2015, OCP had disclosed the design of 6 switches (Accton, Broadcom/interface Masters, Mellanox and Intel 1, Alpha NX 2) Half of these scenarios can be configured as Tor or converged (aggregation) switches as needed.

Software and hardware decoupling, Onie is the key, but also OCP network project early focus. The Onie open receptacle Install environnement (open Network installation environment) is an Open-source project that defines an open "installation environment" for bare metal (bare metal) network switches. Traditional Ethernet switches have preinstalled operating systems, to use, direct management, but will lock the user; the so-called white box (white-box) network switch provides the freedom to choose the hardware, but different CPU architectures and other causes of the heterogeneous management subsystem, but also to the above network operating system to create difficulties.

Onie defines an open source "installation environment" that combines the boot loader (boot loader) with the modern Linux kernel and busybox to provide an environment in which any network operating system can be installed, helping to automate the switching of large data centers (thousands of) rationing, Let the user manage the switch like a Linux server.

The intuitive manifestation of these results is the OCX1100 switch, released in early December 2014 by Juniper NX, which runs the Linux snx-60x0-486f operating system on the hardware of Alpha NX Junos, It is expected to be listed in the first quarter of 2015. snx-60x0-486f is the OCP switch designed by Alpha NX, which offers 48-port 10G Broadcom and 6-port 40G Tridentⅱ from a BCM56854 (SFP QSFP) chip, The CPU subsystem is Freescale (Freescale) P2020 or Intel C2558 and can be used as a Tor or converged switch. This is also true of Dell's collaboration with Cumulus NX (providing Clos), such as the z9500-on Datacenter core and convergence switch.

Figure Note: June 2014 public Wedge switch hardware design, dual redundant power supply unit, 4 fans (source: Facebook)

Yes, Facebook is moving to the core switch. In June 2014, Facebook unveiled its new Tor switch (code-named Wedge), with up to 16 40GbE ports, supporting Intel, AMD, and arm CPUs, with Linux based operating systems (code-named Fboss).

Figure Note: 6-pack hardware platform appearance, due to centralized psu,wedge switch width more compact, 22 parallel placement (Source: Facebook)

February 11, 2015, Facebook announced the launch of the first open hardware modular switch "6-pack", 7RU chassis with 8 wedge switches and 2 fabric cards, a total of 6 floors, with a layer of power and fans underneath. As the core of Facebook's data center fabric, 6-pack will enable Facebook to build larger clusters rather than dividing it into multiple clusters and limiting the size of clusters by network links between clusters.

Figure Note: 6-pack Internal network data path topology (Source: Facebook)

Both wedge and 6-pack will open the design code through OCP.

and change: support from traditional manufacturers

2014 is a year of OCP change, although some confusion has been encountered, but the ecological system has grown markedly, especially in the traditional hardware and software manufacturers to attract.

At the end of January, at the fifth session of the OCP Summit, Microsoft announced its high-profile OCP of IBM, Yandex, Cumulus NX, Box, Panasonic, Bloomberg, IO, LSI (which has been Avago acquired). Compared to IBM, which looks like a source of internal snooping, Microsoft is in good faith-contributing to the design of an open cloud server for global cloud services (such as Windows Azure, Office 365 and Bing) as a "cast".

On the scale of the data center, Microsoft should be bigger than Facebook and the Ibm/softlayer (also the 100,000 + server Tier 2 Internet clients) that are in a frenzy of progress, replacing the purchase of new hardware with OCP is already a great news. To contribute to a group of hardware design specifications and management software source code, Staya Nedella has not yet assumed the amnesty world?

Clearly, Microsoft has similar ideas to Facebook.

Now the OCP server specification and Design page, the Open Cloud Server data is listed on the top, in 2014 UB Workshop is also the server part of the sermon. The OCS 12U chassis is designed for EIA 310-d 19-inch racks, half-width compute and storage blades, two nodes per u (1U2), centralized fan, PSU, and snap-in (chassis manager), very not open Rack, More like the 12U Scorpio 1.0 machine Cabinet (the next chapter describes). So it's really not a technical problem to include the Scorpio project in the OCP-as long as bat is willing ... Before the Open Data Centre Committee was established, of course.

The OCS V2 specification was unveiled at the end of October 2014 at the European Summit in Paris. The V2 compute blade upgrades the CPU from the V1 dual Intel Xeon e5-2400 v2 (10 kernel CPU) to the latest dual Intel Xeon e5-2600 V3 (14 nuclear/cpu,v3 no 2400), The memory is upgraded from 12 ddr3-1333 to 16 ddr4-2133, and the supported capacity extends from 64-192GB to 128-512GB. Computational power is greatly enhanced, but the CPU's TDP also increases from 95W (should be e5-2470 v2) to 120W (should be e5-2683 v3), so that each blade's energy consumption is never increased to 250W to 300W or above.

Figure Note: The chassis components of the Open cloud server, the chassis management card is similar to the R2 of the Scorpio cabinet, featuring the code of the Microsoft open Source chassis management software (Source: OCP UB Workshop)

Therefore, the OCS V2 chassis is also upgraded, first of 6 PSU from 1400W to 1600w,n+1 configuration Total capacity 8kW, support 24 compute blades, n+n configuration is 4.8kW. The cost is the power supply downtime (hold-up time) multiplied from 10 milliseconds to 20 milliseconds, and the new fan matches the blade's energy consumption.

Improved blade performance, higher I/O bandwidth requirements, OCS V2 per layer pallet (Tray) I/O from V1 dual 10GbE and dual 6Gb SAS (x4) upgraded to 10/40gbe and dual 12Gb SAS (x4), and increased PCI Express 3.0 x16 Sandwich Card.

Note: 2011 Microsoft It-pac (IT pre-assembled components,it) in the server rack, seemingly open cloud server predecessor, visual rack height should be above 50U

Storage blades for 10 3.5-inch hard drive Jbod,v2 is also upgraded from V1 6Gb SAS to 12Gb SAS, each rack up to 800 hard disk density alone. The V1 JBOD is still available for the V2 chassis, with 4 3.5-inch hard drives per compute blade (V1 also supports 2 2.5-inch ssd,v2 to 4, and 8 110mm m.2 PCIe nvme modules). Each compute blade can be connected to 1-8 JBOD, which supports 14-84 hard drives.

Note: Facebook's PB-level Blu-ray archive storage System (Source: the register,2014 year)

The fifth session of the OCP Summit also showed Facebook's Blu-ray disk archive storage System, which could hold 10,000 three-layer 100GB discs in 42U of space, up to 1PB, and supposedly save information for 50 years. Facebook's predecessor, Google, has a history of using a larger-than-disk tape, and Facebook says the CD represents the future.

Note: Google South Carolina Berkeley Shawnee Data Center Tape backup system, this photo was previously misrepresented as Google's server (source: Google website)

From the perspective of off-line storage, tapes and discs have their own advantages, short-term Neinian points. Soon after, in late March 2014, Frank Frankovsky announced that he would leave Facebook to be a cold storage start-up based on CDs, but remained on the board of the OCP Foundation in his independent capacity and continued to serve as Chairman and president of the Foundation. The board must have Facebook's spokesperson, adding Facebook's infrastructure director, Jason Taylor, and Microsoft's vice president for cloud and business, Bill Laing, expanded to 7.

Figure Note: Adjusted OCP organization structure (Source: OCP official website)

Veteran storage vendor EMC announced an occupation at the fourth session of the OCP summit in January 2013, only to be OCP by the arm. As a result, EMC World 2014 publishes an ECS (elastic Cloud Storage, resilient cloud storage) device based on x86 commercial server hardware and is asked if it is related to OCP. By contrast, EMC's subsidiary VMware is much more readily available, announcing the addition of OCP at VMworld 2015, which was held at the end of August 2014, and Evo:rack, which is still in the Technology Preview phase, is clearly based on OCP hardware-after all, VMware has no hardware baggage on its own.

Summary: Patterns, inheritance and integration

OCP subordinate projects more and more, the sixth session of the Summit will be held immediately, a short chapter is difficult to say, write a book alone can be. This chapter provides a quick overview of a number of key projects, with the main feelings as follows:

Mode。 OCP's most successful creation is the model of open source hardware, which, while customized by Facebook, has helped push the standardization of hardware and Commodities (commodity) with the help of the community; Open source software models can be learned, not proficient in the hardware of the people are also bad. Many in the hardware manufacturers and even Google has many years of data center related technology accumulation of professionals, for the OCP advantage of the possibility; OCP has reflected the third platform from the Internet to the traditional enterprise market, the impact of the second platform, Microsoft and VMware, Juniper, etc. in the traditional enterprise market accumulation of software assets, to subvert the existing hardware order or adapt to the trend, to join. The characteristics of the open source community and the work done by these franchisees will also affect the direction of OCP development. Other manuscripts have been written, if you have any questions, welcome to the message discussion. We will invite the wide-bin to share. (Edit/Guo Shemei)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More