Recently happened to pay attention to some of these topics, the circle discussion, always mentioned that Facebook reduced the use of Cassandra, bitter no empirical, online search, find a and I understand the basic consistent article, reprinted to this for everyone and their future reference.
Turn from: http://hi.baidu.com/jgs2009/blog/item/76652b4406cc1129879473a7.html
Based on my current reading and conversation, I understand that today's Facebook architecture is as follows:
The Web front-end is written by PHP. Facebook's HipHop [1] will turn PHP into C + + and compile with g++, which can provide high performance for both the template and the Web Logos Business layer.
The business logic exists in the form of a service that uses Thrift [2]. The service is implemented by php,c++ or Java depending on the requirements (and some other languages are also available ...). Java-written services do not use any enterprise-class application server, but use Facebook's own custom application server. It may seem like reinventing the wheel, but these services are only exposed to thrift (most of them), Tomcat is too heavyweight, even jetty can be too much, and its added value is meaningless to Facebook. Persisted by MySQL, Memcached [3], Facebook's Cassandra [4], Hadoop's HBase [5] completed. Memcached uses MySQL's memory cache. Facebook engineers acknowledge that their Cassandra use is decreasing because they prefer hbase because of its simpler consistency model to its mapreduce capabilities. Offline processing uses Hadoop and Hive. Log, click, feeds data usageScribe [6], which is aggregated in HDFS, uses Scribe-hdfs [7], thus allowing for extended analysis using MapReduce. Bigpipe [8] is their custom technology used to speed up the page display. Varnish Cache [9] is used as an HTTP proxy. The reason they use this is high speed and efficiency. [10]. To handle the storage of the 1 billion photos uploaded by the user, which is handled by Haystack, Facebook has developed a AD-HOC storage scheme, which mainly does some low-level optimizations and "append only" write technology [11]. Facebook Messages uses its own architecture, which is clearly built on the infrastructure of a dynamic cluster. Business logic and persistence are encapsulated in a so-called ' Cell '. Each ' cell ' handles a subset of the users, and the new ' cell ' can be added because of the access heat [12]. The persistence archive uses HBase [13]. Facebook's Messages search engine is built by an inverted index stored in the hbase. The details of the Facebook search engine implementation, as far as I know, are currently unknown. The Typeahead search uses a custom storage and retrieval logic. [Chat] Based on a epoll server, this server is developed by Erlang, accessed by thrift [16]
With regard to the resources provided to the above components, here are some information and quantities, but some are unknown: Facebook estimates more than 60,000 servers [16]. Their latest data center is in Oregon State's Prineville, which is based on fully customizable hardware [17], the open Compute project that was recently released [18]. TB of data exists in Memcached [19] their Hadoop and Hive clusters are made up of 3000 servers, each with 8 cores, 32GB of RAM, 12TB hard drives, all 24,000 CPU cores, 96TB memory and 36PB hard drives. [20] 100 billion clicks per day, 50 billion photos, billion hits per days, billion photos, 3 trillion objects are Cache, 130TB daily log (July 2010 data) [21]
Reference Reference
[1] HipHop for php:http://developers.facebook.com/blog/post/358
[2] thrift:http://thrift.apache.org/
[3] memcached:http://memcached.org/
[4] cassandra:http://cassandra.apache.org/
[5] hbase:http://hbase.apache.org/
[6] Scribe: https://github.com/facebook/scribe
[7] scribe-hdfs: http://hadoopblog.blogspot.com/2009/06/ hdfs-scribe-integration.html
[8] bigpipe: http://www.facebook.com/notes/facebook-engineering/ bigpipe-pipelining-web-pages-for-high-performance/389414033919
[9] Varnish Cache: http:// www.varnish-cache.org/
[10] Facebook goes for varnish: http://www.varnish-software.com/customers/ Facebook
[11] Needle in a haystack:efficient storage of billions of photos: http://www.facebook.com/note. php?note_id=76191543919
[12] Scaling the Messages application back end: http://www.facebook.com/ note.php?note_id=10150148835363920
[13] The underlying Technology of messages: https://www.facebook.com/note.php?note_id=454991608919
[14] The underlying Technology of Messages Tech talk: http://www.facebook.com/video/video.php?v=690851516105
[15] Facebook ' s Typeahead search architecture: http://www.facebook.com/video/video.php?v=432864835468
[16] Facebook chat: http ://www.facebook.com/note.php?note_id=14218138919
[17] who has most Web servers?: http:// Www.datacenterknowledge.com/archives/2009/05/14/whos-got-the-most-web-servers/
[A] B uilding efficient Data Centers with the Open Compute project: http://www.facebook.com/note.php?note_id=10150144039563920
[19] Open Compute project: http://opencompute.org/
[20] Facebook ' s architecture presentation at Devoxx 2010: http://www.devoxx.com
[21] scaling Facebook to millions users and beyond: book.com/note.php?note_id=409881258919