Uncover the world's largest website Facebook app

Source: Internet
Author: User
Tags cassandra varnish




June 2010, google announces global top 1000 website. Facebook is the champion.


With Facebook's current business scale, many traditional server technologies will collapse or simply fail to support them. So how will Facebook's engineers keep the site running smoothly in the face of 500 million of active users? This article will showcase the software that Facebook engineers use to complete this daunting task.

The challenge of Facebook level scale

Before we dive into the details, let's look at a group of Facebook that has to face the data and you can imagine this scale.

·  Facebook PV volume per month: 630,000,000,000 (630 billion)

·  the number of images on Facebook exceeds the sum of other image sites (including image sites such as Flickr)

·  more than 3 billion images uploaded to Facebook each month

·  The Facebook system can process 1.2 million images per second. This does not include images processed by Facebook's CDN.

·  processing more than 25 billion of the content per month (including user status updates, reviews, etc.)

·  Facebook has more than 30,000 servers (this data is 2009 years of data)

The software that Facebook uses

In some ways, Facebook is still a lamp-type site, but Facebook has made the necessary changes, extensions, and modifications to its existing methods to match a number of other components and services.

Like what:

·  Facebook still uses PHP, but Facebook has rebuilt a new compiler to accommodate the loading of local code on its Web server, which improves performance;

·  Facebook uses Linux system, but for its own purposes, the necessary optimizations have also been made. (especially in terms of network throughput);

·  Facebook uses MySQL , but it is also optimized for it.

There are also custom systems, such as haystack-highly scalable object storage, to handle Facebook's huge images, and Scribe-facebook's log system.

Here's what's on Facebook, the world's largest social networking site.

Memcached 650) this.width=650; "Src=" Https://s3.51cto.com/wyfs02/M01/92/6F/wKioL1j_XhHTaFI9AAAVLPhjLA0758.png-wh_ 500x0-wm_3-wmp_4-s_2321442572.png "title=" image 2.png "alt=" Wkiol1j_xhhtafi9aaavlphjla0758.png-wh_50 "/>

Memcached is a fairly famous piece of software. It is a distributed memory cache system. Facebook (and a large number of sites) uses it as a caching layer between the Web server and the MySQL server. Over the years, Facebook has done a lot of optimization work on memcached and its related software, such as the web stack.

Facebook runs tens of thousands of memcached servers to handle terabytes of cache data in a timely manner. It can be said that Facebook has the world's largest memcached device.

HipHop for PHP 650) this.width=650; "Src=" Https://s3.51cto.com/wyfs02/M00/92/6F/wKioL1j_XiSSB2P3AABMFfJRGas699.png-wh_ 500x0-wm_3-wmp_4-s_2141104656.png "title=" image 3.png "alt=" Wkiol1j_xissb2p3aabmffjrgas699.png-wh_50 "/>

PHP runs relatively slowly compared to the code running on the local server. Hiphop translates the PHP code into C + + code to improve performance at compile time. Because Facebook relies on PHP to process information, Hiphop,facebook is more powerful in Web servers.

Hiphop birth process: On Facebook, a team of Engineers (initially 3) was developed for 18 months.

Haystack 650) this.width=650; "Src=" Https://s5.51cto.com/wyfs02/M02/92/6F/wKioL1j_XjODxYwuAAAnoGK-A04142.png-wh_ 500x0-wm_3-wmp_4-s_3128550543.png "title=" image 4.png "alt=" Wkiol1j_xjodxywuaaanogk-a04142.png-wh_50 "/>

Haystack is Facebook's high-performance image storage/retrieval system. (Strictly speaking, Haystack is an object store, so it does not necessarily need to store pictures.) The workload of haystack is very large. Facebook has more than 20 billion images, each with four different resolutions, so Facebook has more than 80 billion images.

Haystack's role is not only to deal with a large number of pictures, its performance is the highlight. As we mentioned earlier, Facebook handles about 1.2 million images per second, which does not include the number of images processed by its CDN. This is an amazing data!!!

Bigpipe 650) this.width=650; "Src=" Https://s4.51cto.com/wyfs02/M01/92/70/wKiom1j_XkPhx21TAAAm6DMkyaQ019.png-wh_ 500x0-wm_3-wmp_4-s_1177205919.png "title=" image 5.png "alt=" Wkiom1j_xkphx21taaam6dmkyaq019.png-wh_50 "/>

Bigpipe is a dynamic web-processing system developed by Facebook. To get the best out of it, Facebook uses it to process the tiles (also called "pagelets") for each page.

For example, the chat window is independently retrieved, and the news source is independently retrieved. These pagelets can be retrieved concurrently, and the performance increases. This way, even if a part of the site is deactivated or crashed, users can still use it.

Cassandra 650) this.width=650; "Src=" Https://s3.51cto.com/wyfs02/M00/92/6F/wKioL1j_XlGx783IAAAQW8S2Xp4545.png-wh_ 500x0-wm_3-wmp_4-s_3408177824.png "title=" image 6.png "alt=" Wkiol1j_xlgx783iaaaqw8s2xp4545.png-wh_50 "/>

Cassandra is a distributed storage system with no single point of failure. It is a member of the former NoSQL movement and is now open source (has been added to the Apache project). Facebook uses it to do a mailbox search.

In addition to Facebook, Cassandra also works with many other services, such as Digg.

Scribe 650) this.width=650; "Src=" Https://s5.51cto.com/wyfs02/M02/92/6F/wKioL1j_Xl7g4qyBAAAmDgaZw24259.png-wh_ 500x0-wm_3-wmp_4-s_1705472255.png "title=" image 7.png "alt=" Wkiol1j_xl7g4qybaaamdgazw24259.png-wh_50 "/>

scribe is a dynamic journaling system that Facebook uses for many internal purposes. scribe use: To process Facebook level logs, scribe will automatically be processed as soon as new log classifications are generated. (Facebook has hundreds of log categories).

Hadoop and Hive650) this.width=650; "Src=" https://s1.51cto.com/wyfs02/M01/92/70/wKiom1j_ Xmyhjhubaaaz0jwrs5w689.png-wh_500x0-wm_3-wmp_4-s_2030149880.png "title=" image 8.png "alt=" wkiom1j_ Xmyhjhubaaaz0jwrs5w689.png-wh_50 "/>

Hadoop is an open-source map/reduce framework that makes it easy to handle massive amounts of data. Facebook uses it to do data analysis. (as I said earlier, Facebook's data volumes are huge.) Hive originated in facebook,hive can use SQL queries, making it easier for non-programmers to use Hadoop. (Note 1:hive is a Hadoop-based data warehousing tool that maps structured data files into a single database table and provides full SQL query functionality that can be translated into a mapreduce task to run.) )

Thrift

Facebook uses different languages in its different services. For example: PHP used on the front end, Erlang for chat systems, Java and C + + for other places, and so on. Thrift is an internally developed cross-language framework that binds different languages together so that they can "communicate" with each other. This makes it easier for Facebook to develop across languages.

Facebook has thrift open source, and thrift supports more languages.

Varnish 650) this.width=650; "Src=" Https://s5.51cto.com/wyfs02/M01/92/70/wKiom1j_XniBzNdqAAAWCXyFFZ0247.png-wh_ 500x0-wm_3-wmp_4-s_2072979811.png "title=" image 9.png "alt=" Wkiom1j_xnibzndqaaawcxyffz0247.png-wh_50 "/>

Varnish is an HTTP accelerator that acts as a load balancing role and is also used to quickly process cached content.

Facebook handles images and user photos with varnish and handles 1 billion levels of requests every day. Like other apps on Facebook, varnish is also open source.

Facebook can run smoothly and be beneficial to other aspects

Although some of the software that makes up the Facebook system is mentioned above, dealing with such a large system is a complex task in itself. So, here's a list of things that will keep Facebook running smoothly.

Gradual release & Dark Start

Facebook has a system that they call "the doorman." The system can run different code for different kinds of users. (It briefly describes the different conditions in the code base.) The system allows Facebook to gradually release new features, A/B testing, activating features only for Facebook employees, and more.

The doorman system also lets Facebook do some "dark start" things. For example, you can activate a component behind a feature before it goes live. In addition, it can do simulated stress testing to identify bottlenecks and potential problems. Silent start-up is usually done in 2 weeks before the official launch.

Introduction to Real-time systems

Facebook will carefully monitor its system, and interestingly, it also monitors the performance of each PHP function in a real-time production environment. This real-time PHP environment monitoring is done through an open source tool called Xhprof.

Progressively disable certain features to improve performance

If Facebook is experiencing performance issues, Facebook has a number of ways to gradually disable the unimportant features to improve its core feature performance.

Something that has not yet been mentioned

While there is no way to get too deep into the hardware, hardware is definitely an important factor for Facebook to reach an unprecedented scale. For example, like other large websites, Facebook uses a CDN to handle static content. Facebook also has a large data center in Oregon State, west of the United States, that can add servers at any time.

Of course, in addition to what has already been mentioned, there are a lot of other software that has not been said. However, we hope to highlight some of the most distinctive features.

"Romance" between Facebook and open source

The link between Facebook and open source cannot be said to say that Facebook loves open source, but at the very least, Facebook is "Love" for open source.

Facebook not only uses (but also donates) Open source software, for example, Linux , Memcached, MySQL, Hadoop, and so on, it also developed a lot of software in-house, and also open it up.

Open source projects developed by Facebook, including Hiphop, Cassandra, Thrift and scribe. In addition, Facebook has also tornado open source. Tornado is a high-performance Web server framework developed by the FriendFeed team behind the scenes. (Facebook acquired FriendFeed in August 2009.) )

(The open source software that Facebook uses can be found on Facebook's open source pages.) )

Facing more large-scale challenges

Facebook is growing at an incredible rate. Its user base is almost doubled, and the number of active users is now close to 500 million. Moreover, no one can predict the end of this year, the number of active users.

Facebook has even set up a dedicated "growth group" that is constantly thinking about how to get people to use Facebook and incorporate it into Facebook.

This rapid growth means that Facebook will encounter different performance bottlenecks. Facebook faces challenges like PV, search, uploaded images and status messages, interactions between users and interactions between users and Facebook.

This is also the fact that Facebook faces. Facebook's engineers will continue to look for new ways to expand (it's not just a matter of adding servers). For example, as the website grows, its picture storage system has been completely rewritten several times.

So, we'll see Facebook engineers rushing to the next "hill". We believe they will live up to expectations. After all, they are crossing the hill, the mountain that most of us can only aspire to; they are expanding the site, the user from all over the world. When you reach that milestone, you will Duonian.

Data Source: Facebook engineers ' reports and blogs.


This article from "Li Shilong" blog, declined reprint!

Uncover the world's largest website Facebook app

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.