Facebook is a social networking service website, and Facebook is the number one photo sharing site in the United States, uploading 8.5 million photos a day. So what is the Facebook system architecture like? This article will reveal to you.
Source: Http://www.quora.com/What-is-Facebooks-architecture (answered by Micha?l Figuière)
Based on my current reading and conversation, I understand that today's Facebook architecture is as follows:
The Web frontend is written by PHP. Facebook's HipHop [1] will turn PHP into C + + and compile with g++, which will provide high performance for the template and web business layer.
The business logic exists in the form of a service that uses thrift [2]. The service is implemented by php,c++ or Java depending on the requirements (some other languages can also be used ...)
Services written in Java are not used by any enterprise application server, but Facebook's own custom application server is used. It looks like a re-inventing the wheel, but these services are only exposed to thrift (which is the big number), Tomcat is too heavyweight, and even jetty may be too much, and its added value is meaningless to Facebook.
Persistent by MySQL, Memcached [3], Facebook's Cassandra [4], Hadoop's HBase [5] completed. Memcached uses the memory cache of MySQL. Facebook engineers admit that their Cassandra use is being reduced because they prefer hbase because it's simpler consistency model to its mapreduce capabilities.
Offline processing uses Hadoop and Hive.
Log, click, feeds Data using scribe [6], aggregating it together in HDFS, which uses scribe-hdfs[7], thus allowing the use of MapReduce for extended analysis.
Bigpipe [8] is their custom technology, used to speed up the page display.
Varnish Cache [9] is used as an HTTP proxy. The reason they use this is high speed and efficiency. [10].
Used to handle the storage of 1 billion photos uploaded by the user, which was handled by Haystack, Facebook developed a ad-hoc storage scheme, which mainly made some low-level optimizations and "append only" write technology [11].
Facebook Messages uses its own architecture, which is clearly built on the infrastructure of a dynamic cluster. Business logic and persistence are encapsulated in a so-called ' Cell '. Each ' cell ' handles a subset of users, and the new ' cell ' can be added because of the access heat [12]. The persistent archive uses HBase [13].
The Facebook Messages search engine is built by an inverted index stored in hbase. [14]
Facebook search engine implementation details as far as I know, it's unknown.
Typeahead Search uses a custom storage and retrieval logic. [15]
Chat is based on a epoll server, developed by Erlang and accessed by thrift [16]
Here are some information and quantities about the resources that are supplied to the above components, but some are unknown:
[16] Facebook estimates that there are more than 60,000 servers. [18] Their newest data center is in Oregon State's Prineville, which is based on fully customizable hardware [17], which is the most recent open Compute project.
19 TB of data present in Memcached
Their Hadoop and Hive clusters consist of 3000 servers, each with 8 cores, 32GB of memory, 12TB of hard disks, all with 24,000 CPU cores, 96TB of memory, and 36PB hard drives. [20]
Daily 100 billion hits, 50 billion photos, billion hits per day, billion photos, 3 trillion objects are Cache, 130TB log per day (July 2010 data) [21]
Reference Reference
[1] HipHop for php:http://developers.facebook.com/blog/post/358
[2] thrift:http://thrift.apache.org/
[3] memcached:http://memcached.org/
[4] cassandra:http://cassandra.apache.org/
[5] hbase:http://hbase.apache.org/
[6] Scribe:https://github.com/facebook/scribe
[7] Scribe-hdfs:http://hadoopblog.blogspot.com/2009/06/hdfs-scribe-integration.html
[8] bigpipe:http://www.facebook.com/notes/facebook-engineering/ bigpipe-pipelining-web-pages-for-high-performance/389414033919
[9] Varnish cache:http://www.varnish-cache.org/
[Facebook] goes for Varnish:http://www.varnish-software.com/customers/facebook
[One] Needle in a haystack:efficient storage of billions of photos:http://www.facebook.com/note.php?note_id=76191543919
Scaling the Messages application back end:http://www.facebook.com/note.php?note_id=10150148835363920
The underlying technology of messages:https://www.facebook.com/note.php?note_id=454991608919
The underlying technology of Messages Tech talk:http://www.facebook.com/video/video.php?v=690851516105
[Facebook] Typeahead search architecture:http://www.facebook.com/video/video.php?v=432864835468
[+] Facebook chat:http://www.facebook.com/note.php?note_id=14218138919
[] who have the most Web servers?:http://www.datacenterknowledge.com/archives/2009/05/14/whos-got-the-most-web-servers/
[Building] Efficient Data Centers with the Open Compute project:http://www.facebook.com/note.php?note_id= 10150144039563920
[Compute] Open project:http://opencompute.org/
Facebook ' s architecture presentation at Devoxx 2010:http://www.devoxx.com
[+] Scaling Facebook to millions users and beyond:http://www.facebook.com/note.php?note_id=409881258919
Finally, this article comes from 51ctohttp://developer.51cto.com/art/201104/257508.htm