from all the information I've seen, and from conversations with people of all kinds, it can be concluded that the current architecture of Facebook is this: The Web front-end is written in PHP language and then converted to C + + using hiphop compiler[1], then written with the g++ compiler, providing high-performance templates and web logic execution layers. Rely entirely on the limitations of static compilation, allowing Facebook to start enabling Hiphop interpreter [2] and hiphop virtual machines to translate PHP code into hiphop bytecode[3]. Its business logic exists in the form of a service, using the thrift Framework [4]. Some of these services are implemented using PHP, C + +, or the Java language (and possibly some other language), depending on specific requirements. The services that are implemented using Java do not use any of the usual enterprise application services, but rather Facebook's custom application servers. Initially these are considered repetitive tasks, but as these services only (or mostly) use the thrift Framework, Tomcat and even jetty appear to be too expensive and inconsistent. Use MySQL, memcached[5], Hadoop ' s hbase[6] to achieve persistence, with Memcached as MySQL cache and universal cache. Use Hadoop and hive to implement offline processing. Data transfers such as logs, links and feeds are implemented using scribe[7], and Scribe-hdfs [8] is used to accomplish HDFS aggregation storage, which can be further extended with MapReduce. BIGPIPE[9] is their custom technology to speed up page rendering with pipelined logic. Using varnish cache[10] to implement HTTP proxies, this software is favored for its performance and efficiency [11]. Facebook users are publishing hundreds of millions of photos, which are stored by haystack, a ad-hoc storage solution developed by Facebook-including low-level optimizations and only extended write mode [12]. The Facebook message uses its own architecture--well known as a framework based on partitioning and dynamic cluster management. Business logic and persistence are encapsulated in the so-called "Cell". Each cell processes a portion of the user's request and expands the new cell[13 as the number of users increases. Use HBase to implement persistence [14]. The Facebook message's search engine is built on a reverse index, stored in hbase [15]. The details of the Facebook search engine implementation are still unknown. Pre-input searches (Typeahead search) use custom storage and retrieval logic [16]. The Chat service is based on the Epoll server, developed by Erlang, and accessed with thrift[17]. Facebook also builtAn automated system that initiates the appropriate repair workflow to manage response alerts and notifies the human administrator when a failure cannot be resolved [18].
in known information, the configuration resources for each component, some information, and the numbers are as follows: Facebook has more than 60,000 servers [18]. Recently released data center located in Oregon State Prineville City, the hardware is completely self-designed [19], and is classified as Open Compute project[20].
memcached stores and processes data up to 300tb[21]. Its Hadoop and hive cluster consists of 3000 8 cores, 32G memory, 12TB space servers, totaling 24,000 cores, 96TB memory, 36PB space [22]. In July 2010 it has reached 100 billion clicks per day, 50 billion images, 3 trillion cached objects, 130TB of log [22]. Note: Cassandra is no longer in use. Facebook's real-time analytics system is based on recording all the input links (like and comment requests from the user's page). Record it in HDFs instead of pulling it out of the Puma and storing it in batches into hbase.
relevant information and reference articles also include:
Facebook recently posted a blog post detailing the next-generation network architecture that will be piloted in the Altoona Data center. This way of handling large flow is very novel, superior to traditional methods and protocols. Facebook launches Next Generation network
There is the recent announcement of enhanced search function, with large data analysis and data management base as support. Facebook's big data analysis enhances search capabilities
Also available for reference are: Facebook's data warehousing and architecture Analysis Apache Hadoop application in Facebook (more information can be referred to Dhruba Borthakur's blog) with Jemalloc implementation scalable Memory Allocation (and this question and answer) Tornado (web framework) Facebook's search speed is so fast Malte Schwarzkopf:facebook stack Facebook's code design evolutionary history
reference materials include:
[1] HipHop for PHP
[2] Making HPHPI faster
[3] The HipHop Virtual Machine
[4] Thrift
[5] Memcached
[6] HBase
[7] Scribe
[8] Scribe-hdfs
[9] Bigpipe
[Ten] varnish Cache
[One] Facebook goes for varnish
[[] Needle in a haystack:efficient storage of billions of photos
[Scaling] Messages application back
[A] The underlying Technology of Messages
[from] The underlying Technology of Messages Tech Talk
[[] Facebook ' s Typeahead search architecture
[Chat] Facebook
[A] who has the most Web Servers?
[Building] Efficient Data Centers with the Open Compute Project
[[] Open Compute Project
[Architecture] Facebook ' s presentation at Devoxx 2010
[Scaling] Facebook to millions users and beyond
Original link: What is Facebook ' s architecture? (translator/vera Zebian/Chang)