ArticleDirectory
- Scalability challenges of Facebook
- Software on which Facebook expansion depends
- Other things that keep Facebook running smoothly.
From: http://www.yankay.com/facebook%E8%83%8C%E5%90%8E%E7%9A%84%E8%BD%AF%E4%BB%B6? Variant = ZH-CN
The size of Facebook's data makes many traditional solutions not applicable at all, or cannot be broken down for processing. It is not easy to maintain a stable and reliable system with 0.5 billion users. This article introduces the software used by Facebook.
Scalability challenges of Facebook
Before we discuss the details, we have some software scale that Facebook has already developed:
- Facebook has570000000000 page views per month(According to Google ad planner ).
- Facebook has more photos than all other image websites (including websites such as Flickr ).
- More3 billion photosUploaded.
- Facebook System Service Processing per second1.2 million photos. This does not include photos processed in the CDN service.
- More2.5 billion entries(Status updates, comments, etc.) are shared.
- Facebook has exceeded30,000 servers(This number is last year !)
Software on which Facebook expansion depends
Facebook is, to some extent, still a lamp site, but it is much larger than normal lamp to incorporate other elements and many services and modify the current practice.
For example:
- Facebook still uses PHP, but it has already created a compiler for it so that it can be divided into localCodeEnable the Web server to improve performance.
- Facebook uses Linux, but he has made special improvements to network throughput.
- Facebook uses MySQL, but mainly serves as a key-value persistent storage. jions and server logic operations are performed on Web servers. Because it is easier to execute.
There are also self-compiled systems, such as haystack, a highly scalable Object Storage System, used to store Facebook photos. There is also scribe, a log system that can run on Facebook's massive log system.
OK. Let's introduce the software used by the world's largest social network website.
Memcached
Memcached is one of the most famous software on the Internet. This is a distributed memory cache system used as the cache layer between the Web server and the MySQL server (because the database access is slow ). Over the years, Facebook has proposed some methods to optimize memcached and peripheral software. Such as compressing network stack.
Facebook caches 10 TB of data on thousands of memcached servers every moment. It may be the world's largest memcached cluster.
Hiphop for PHP
Php as a scripting language, and localProgramIt is slower than running. Hiphop can convert PHP to C ++ code and then compile it to achieve better performance. Facebook relies heavily on PHP, which makes it more efficient for Web servers to run.
A small team of engineers spent 18 months developing hiphop on Facebook (only three people at the beginning) and it is now available.
Haystack
Haystack is Facebook's high-performance photo storage/Retrieval System (strictly speaking, it is an object storage, so it does not have to store photos ). It has a lot of work to do; there are more than 20 billion uploaded photos, and each one is saved in four different resolutions, so there are more than 80 billion photos.
It not only processes hundreds of millions of images, but also plays a critical role in operation. As we mentioned earlier, Facebook has about 1.2 million photos of its services.Per secondThis number does not include those on the CDN. This is an amazing number.
Bigpipe
Bigpipe is a dynamic web service system developed by Facebook. Facebook uses it to process each page by section (called pagelets) for optimal performance.
For example, the chat window is separated, and the news feed is also separated. These pagelets can be used when a page is displayed, which is obtained when the page is displayed. Even if some projects are closed or mid-range, users can obtain some webpages.
Cassandra
Cassandra is a distributed storage system with no single point of failure. This is an important part of the nosql movement and has been made public.Source code(It even becomes an Apache project ). Facebook uses it in the search function.
In addition to Facebook, some people also use it, such as Digg. But recently Twitter gave up Cassandra.
Scribe
Scribe is a flexible log system, and Facebook is widely used inside it. It can process large-scale log records on Facebook and automatically process new log record categories. Facebook has hundreds of log categories ).
Hadoop and hive
Hadoop is an open-source map-reduce implementation that allows it to perform operations on big data. Facebook uses this for data analysis (and we all know that Facebook already has a large amount of data ). Hive is originated from Facebook, making SQL queries for hadoop more likely, but it is easier for non-programmers to use.
Hadoop and hive are open-source (APACHE projects) and have many followers, such as Yahoo and Twitter.
Thrift
Facebook uses several different languages and services. PHP is finally used for front-end, Erlang is used for chat, Java and C ++ are also used in multiple places, maybe there are other languages. Thrift is an internally developed cross-language framework that connects languages so that they can work together to enable interaction between them. This makes it easier for Facebook to continue its cross-language development.
Facebook has made thrift open-source. More languages have been added to thrift.
Varnish
Varnish is an HTTP accelerator that acts as a Load balancer and caches content, which can then be delivered at lightning speed.
Facebook uses Arnish to process photos and Personal Data images, processing billions of requests per day. Like other things, varnish is open-source.
Other things that keep Facebook running smoothly.
The software we have mentioned forms a Facebook system and helps run on a large scale. However, processing such a large system is a complicated task, so we will list some other things that keep Facebook running smoothly.
Gradual release and dark start
Facebook has a so-called keeper system that allows them to run two different systems for different users. This allows Facebook to gradually release new features, such as A/B testing, only for Facebook employees.
Keeper can also enable "Dark start" for Facebook, which activates some features before users use some features (it is called a dark start because users are not aware of it ). This will serve as a stress test in the real world, helping expose some functional barriers and other problems before the official launch. Dark start is usually two weeks before the official start.
Profiling Live Broadcast System
Facebook carefully monitors its system. Interestingly, it is also responsible for monitoring the performance of every PHP function in the production environment. Checks the configuration and running status of each PHP environment. Use the open-source tool xhprof.
Gradually use the disabled feature to improve performance
If Facebook encounters performance problems during runtime, one way is to gradually disable less important features to enhance the performance of Facebook's many core features.
Things we did not mention
We didn't mention hardware-related things, but this is also an important part of improving scalability. For example, like other large websites, Facebook uses CDN to process static content. Facebook also has the huge data center, which can help him expand more services.
Facebook's open source Complex
Not only does Facebook use (and help) Open-source software such as Linux, memcached, MySQL and hadoop, but in many other cases, it also contributes a lot of internal development software.
Facebook is also open-source tornado, a high-performance network server framework developed by the friendfeed team.
You can find the Open Source Software List on Facebook's Open Source Page.
Data sources:
Http://royal.pingdom.com/2010/06/18/the-software-behind-facebook/
Various presentations by Facebook engineers, as well as the always informativefacebook engineering blog.