Based on my current readings and conversations, I understand today's http://www.aliyun.com/zixun/aggregation/1560.html"> Facebook is structured as follows:
Web front end is written by PHP. HipHop from Facebook converts PHP into C ++ and compiles with g ++, which provides high performance for templates and Weblog business layers.
Business logic exists as a Service, which uses Thrift. These services according to different needs by PHP, C + + or Java (you can also use some other languages ...)
Services written in Java do not use any enterprise application server, but use Facebook's own custom application server. It looks like reinventing the wheel, but these services are only exposed to Thrift (the vast majority are the same), Tomcat is too heavyweight, even Jetty may be a bit too far for its value added to what Facebook needs Pointless.
Persistence is done by MySQL, Memcached, Facebook's Cassandra, Hadoop HBase [5]. Memcached uses MySQL's memory cache. Facebook engineers admit that their use of Cassandra is declining because they prefer HBase because of its simpler consistency model to its MapReduce capabilities.
Offline processing using Hadoop and Hive.
Log, click, feeds data Scribe, aggregate and exist HDFS, which uses Scribe-HDFS, allowing extended analysis using MapReduce.
BigPipe [8] is their customization technology to speed up page displays.
Varnish Cache [9] serves as an HTTP proxy. The reason they use this is speed and efficiency.
Used to store the billions of photos uploaded by users. It was handled by Haystack. Facebook developed an Ad-Hoc storage solution by itself. It did some low-level optimization and "append-only" writing [11].
Facebook Messages uses its own architecture, which is clearly built on a dynamic cluster infrastructure. Business logic and persistence are encapsulated in a so-called 'Cell'. Each 'Cell' handles part of the user, and the new 'Cell' can be added due to the popularity of the visit. Persistent archive using HBase.
The Facebook Messages search engine is built from an inverted index stored in HBase.
Facebook search engine implementation details as far as I know is currently unknown state.
Typeahead search uses a custom storage and retrieval logic.
Chat is based on an Epoll server, developed by Erlang and accessed by Thrift
Here are some of the information and the amount of resources that are available to the above components, but some are unknown:
Facebook estimates there are more than 60,000 servers [16]. Their latest data center in Prineville, Oregon, is based on fully custom designed hardware that was recently released as Open Compute.
300 TB of data exists in Memcached
Their Hadoop and Hive cluster consists of 3000 servers, each with eight cores, 32GB of memory, 12TB of hard drive, 24K of CPU cores, 96TB of memory and 36PB of hard drive.
100 billion daily hits, 50 billion photos, 3 trillion objects being cached, 130 TB daily logs (July 2010 data)