Many traditional solutions do not apply to a huge web site such as Facebook. The big challenge for Facebook engineers is to keep the site of a nearly 500 million active user running steadily and reliably. How they did it, this article will introduce the software and technology they use. The challenge of Facebook's rapid development
Looking at some of the data below before entering into the details of the discussion may give you some intuitive understanding of the enormous challenges Facebook faces: Facebook has 570 billion page views per month (according to Google Ad Planner). Facebook has more photos than any other photo site adds (including Flickr and other sites). More than 3 billion photos per month are uploaded. The Facebook system handles 1.2 million photos per second. This does not include photos of CDN processing. More than 2.5 billion items per month (status updates, comments, etc.) are shared. Facebook has more than 30,000 servers (this is last year's data). Facebook's rapid development relies on software
To some extent, Facebook is still the site of lamp, but to accommodate many other elements and services, Facebook has had to refine and expand it and revise some of its current practices.
For example, Facebook still uses PHP, but it builds a compiler that compiles PHP code into local code that executes on a Web server, improving performance. Facebook uses Linux, but optimizes Linux for its own needs, especially in terms of network throughput. Facebook uses MySQL, but primarily as a key-value persistent storage system, where connection queries and logical operations are placed on a Web server because it is easier to optimize.
There are also self developed systems, such as haystack, a highly scalable object storage system for storing huge amounts of photos on Facebook. And scribe, a log system that can run on a massively large web site like Facebook.
Well, let's take a look at the software used by the world's largest social networking site.
Memcached
Memcached is one of the most famous software on the internet today. It is a distributed memory caching system that Facebook (which includes many other sites) uses as a caching layer between the Web server and the MySQL server (because database access is relatively slow). Over the years, Facebook has made a lot of optimizations for memcached and its peripheral software, such as network stack optimization.
Facebook has 10TB of data cached on thousands of memcached servers every moment. It may be the largest memcached server cluster in the world. HipHop for PHP
PHP, as a scripting language, is slow to run compared to native code. Hiphop can convert PHP to C + + code, and then compile it for better performance. Because Facebook relies heavily on PHP, hiphop can improve the performance of your Web server.
A small team of engineers spent 18 months on Facebook (only three people at first) to develop Hiphop, which is now in the formal use. Haystack
Haystack is a high-performance image access system for Facebook (strictly an object storage system, so it's not limited to storing photos). It's busy; it manages more than 20 billion uploaded photos, and each one is saved at four different resolutions, so there are more than 80 billion photos.
Not only can it handle billions of photos, but performance is also critical. As we mentioned earlier, Facebook handles about 1.2 million photos per second and does not include the CDN, which is an astonishing number. Bigpipe
Bigpipe is a dynamic web-processing system developed by Facebook. For best performance, Facebook uses it to process each page in chunks (called "pagelets").
For example, chat windows, news feeds, and so on, are divided into pieces for transmission. These pagelets can work in parallel, not only to improve performance, but even if some of them fail or break, it does not affect the user's normal access. Cassandra
Cassandra is a distributed storage system that avoids single points of failure. It is a model of the NoSQL movement and has opened the source code (it even became an Apache project). Facebook uses it in Inbox search.
In addition to Facebook, other sites are using it, such as Digg. Scribe
scribe is a flexible journaling system that Facebook uses extensively internally. It handles large-scale logging of Facebook and automatically processes newly generated logging categories (Facebook has hundreds of log categories). Hadoop and Hive
Hadoop is an open source Map-reduce implementation that can easily handle massive amounts of data. Facebook uses it for data analysis (we all know that Facebook has a huge amount of data). Hive, which originated in Facebook, makes it possible to make SQL queries against Hadoop so that it can be easily used by programmers. (Note: Hive is a data Warehouse tool based on Hadoop that maps structured data files to a database table and provides complete SQL query functionality that translates SQL statements into MapReduce task runs.) )
Both Hadoop and hive are open source (Apache projects) and are used by large web sites, such as Yahoo and Twitter. Thrift
Facebook uses different languages in different services. PHP is for the front end, Erlang for chat, and Java and C + + in some places (and perhaps other languages). Thrift is an internally developed cross-language framework that binds different languages together, allowing them to "communicate" with each other. This makes it easy for Facebook to develop across languages.
Facebook has thrift open source and will support more languages. Varnish
Varnish is an HTTP accelerator that can be used not only as a load balancer, but also to cache content quickly.
Facebook uses varnish to process photos and profile images, dealing with billions of of requests a day. Like other software used by Facebook, varnish is also open source. Other factors to keep Facebook running smoothly
Above we introduce some software that supports the Facebook Web site system. But dealing with such a huge system is a complex task, and we'll list some of the other things that keep Facebook running smoothly. Progressive release and Dark start
Facebook has a system called "gatekeeper" that can run different code for different users. It allows Facebook to gradually release new features, conduct A/B testing, and only activate certain features for Facebook employees.
Gatekeeper can also get Facebook to "Dim boot" and activate certain features of these features before they are officially put into use (users will not notice because the UI is not visible, so call it a dark boot). This can be used as a real-world stress test to help identify bottlenecks and other issues before a formal release. Dark start is usually two weeks before the official launch. Profiling
Facebook will carefully monitor the operation of the system and, interestingly, monitor the performance of each PHP function in the production environment, using open Source Tool xhprof.
Improve performance by gradually disabling unimportant features
If there is a performance problem with Facebook running, one approach is to gradually disable less important features to enhance the performance of the core features of Facebook. The areas not mentioned
We're not talking about hardware, but that's an important part of how Facebook can reach this size. Like other large web sites, for example, Facebook uses CDN to handle static content. Facebook's Oregon State in the western United States also has a huge data center that can be added to the server whenever needed.
Facebook's Open source scenario
When it comes to Facebook, we can't help but mention how much Facebook likes open source, or, so to speak, that Facebook is "loving" open source.
Facebook not only uses (and contributes to) existing Open-source software, such as Linux,memcached,mysql,hadoop, but also opens up Open-source software for its in-house development. For example: HipHop, Cassandra, Thrift and Scribe.
Facebook also open-source a high-performance Web server framework developed by the FriendFeed team tornado (FriendFeed was acquired by Facebook in August 2009).
The list of Open-source software that Facebook uses can be found in Http://facebook.com/opensource.
More challenges from rapid development
Facebook is developing at an incredible pace, its number of users is growing by almost a few levels, now close to 500 million active users, who don't know how much the figure will reach at the end of the year, and it looks like an increase of 100 million users every 6 months.
Facebook even has a dedicated "growth team" that constantly studies how to get people to use and integrate into Facebook.
Such rapid development, such as page browsing, image upload, state information and other users and Web sites, as well as the user and users of the various interactive content growth, will produce a variety of performance bottlenecks, bring a variety of challenges.
This is the reality Facebook has to face. Facebook engineers must constantly try and find new ways to solve the problems of the rapid development of the site, such as Facebook's photo storage system has been completely rewritten several times.
We're just waiting to see what the Facebook engineers are going to bring, and I'm sure it'll be fun. After all, they are climbing a mountain that most of us can only see in our dreams, building a site with more people than most of the country. To do such a thing, you must have a little creativity. (Note: The last paragraph is difficult to translate, thank you for the Half Rain translation support)
Exploring the software behind Facebook, the world ' s largest site
Translation: Blog Park
Partially translated text reference from: http://blog.jobbole.com/entr.php/73
Related article: Millions PHP website poppen.de architecture Sharing