2017-6-8/Large Web site architecture summary

Source: Internet
Author: User
Tags memcached server memory ruby on rails amazon cloud services

First, Wikipedia (Wikipedia)Wikipedia is a non-profit site, so use free software and inexpensive servers wherever possible. As of 2012, the only hundreds of servers and more than 10 technicians to develop and maintain the site, the world ranked 6th in the flow of large sites, it can be seen that its architecture, performance optimization There are many worthy of our learning place. 1. Wikipedia Data volume
    • Spikes of 30,000 HTTP requests per second
    • 3Gbit flow per second, near 375MB
    • 350 PC Servers

2. Wikipedia's main structure

3, Wikipedia's performance optimization strategy1) Front end① web site front-end services include DNS services, CDN services, reverse proxy services, static resource services, and so on. For Wikipedia, more than 80% of user requests can be returned through the front-end service, and requests do not reach the application server, which greatly reduces the pressure on the application server and storage side. ②CDN Service is a great credit for Wikipedia performance optimization. Because most of the entries in the user query are concentrated on a very small hotspot, the content pages are cached on the CDN server, and the CDN server is deployed closest to the user's browser, the user requests to return directly from the CDN, the response is very fast, These requests do not even reach the Squid reverse proxy server in the Wikipedia data center, the server pressure is reduced, and the saved resources can process other requests that are not cached by the CDN more quickly. the core of the ③wikipedia front-end architecture is the reverse proxy Server squid cluster, which deploys approximately dozens of servers. Requests are distributed evenly over the LVS load to each squid server, the hotspot entry is cached here, a large number of requests can be returned directly to the response, and the request does not need to be sent to the Apache server to relieve the application load pressure. Squid cache can not hit the request again through the LVS sent to the Apache application Server cluster, if the entry information update, the application server using the Invalidation Notification service notifies the squid cache invalidation, re-access the Application Server update entry. 2) Service sideThe most important method of back-end optimization is to use the cache to cache the hotspot data in the memory of the distributed cache system, speed up the data read operation of the application server and reduce the load of the storage and database server. Wikipedia's cache usage strategy is as follows:① hotspot-focused data is cached directly into the local memory of the application server, because it consumes the memory of the application server and each server needs to cache the data repeatedly, so the amount of data is small but very high-frequency. the content of the ② cache data is as far as possible a format that the application server can use directly, such as HTML, to reduce the cost of parsing the construction data after the application server obtains the data from the cache. ③ uses a cache server to store session objects. ④ The memcached is very inexpensive compared to the database, creating a memcached connection if necessary. 3) MySQL database optimization① uses a larger server memory. In the Wikipedia scenario, adding more memory can improve MySQL performance better than adding additional resources. ② uses a RAID0 disk array to speed up disk access, while RAID0 accelerates disk access, but reduces the reliability of the database, and the reliability of the data can be solved by means of MySQL master-slave replication, data asynchronous backup, and so on. ③ Set the database transaction consistency at a lower level, speeding down the downtime recovery rate. ④ If the master database goes down, the application is immediately switched to the slave database and the data write service is turned off, which means that the term editing function is turned off (by constraining the business to get a technical solution option). Second, Facebook1. The amount of data on Facebook
  • Page views per month: 570000000000 (570 billion)
  • The number of photos exceeds the sum of other image sites (including sites such as Flickr)
  • More than 3 billion photos uploaded per month
  • Can process 1.2 million photos per second, not including CDN processed photos
  • Process more than 2.5 billion items per month (status updates, reviews, etc.)
  • More than 30,000 servers (this data is 2009 years of data)

2. Facebook's main structure

facebook uses lamp (Linux, Apache, MySQL, PHP) as a technical framework, in order to cooperate with a large number of other components and services, Facebook has made the necessary changes, extensions and modifications to existing methods. For example:

    • facebook still uses PHP, but the new compiler has been rebuilt, To accommodate the loading of local code on its Web server, thereby improving performance;
    • facebook using a Linux system, But for its own purposes, the necessary optimizations have been made, especially in terms of network throughput;
    • facebook using MySQL, also optimized for it;
    • There are custom systems, such as:
                 haystack-highly scalable object storage to handle Facebook's huge images; Scribe-facebook's log system. third,Google App Engine Google App Engine is a PAAs service that helps users quickly build Web applications without the need to spend a lot of time and effort on operations. The main features are: Support for Web applications, provide persistent storage space, automatic scaling and load balancing for applications, support for various services such as email, user authentication and caching, etc. The gae architecture is divided into three parts as shown: Front end, datastore and service group, the main architecture is as follows:Iv. Amazon AWS1. Platform and status
    • linux, Oracle, C + +, Perl, Mason, Java , Jboss, Servlets
    • More than 55 million active customer accounts
    • More than 1 million active retail partners worldwide
    • The service you need to access to build a page is between 100 and 150
2. The main architecture of Amazon AWS Amazon Cloud services AWS mainly includes the following modules:v. Twitter1. Data volume of Twitter
    • Twitter has 1.8 million independent access users per month, and 75% of traffic comes from sites outside of Twitter.com.
    • 3 billion requests per day via API
    • On average, 5,500 times a day tweet,37% active users are mobile users, and about 60% of tweets come from third-party apps.
2. System Architectureplatform: Ruby on Rails, Erlang, MySQL, Mongrel, Munin, Nagios, Google Analytics, AWStats, Memcached
Six, Youku1, Youku data volume
    • Daily average number of independent visitors (UV) reached 89 million (2010 data)
    • Average daily visits (PV) reached 1.7 billion, with this data becoming the highest ranking maker of video sites in the Google List (2010 data)
    • 1000 + servers (2007 data)
2, the main structure of Youku[Reference link]Various large-scale web site technology architecture talk about Facebook's server architecture Facebook Architecture Learning Cloud computing System Learning notes--aws:amazon Web Servicesamazon Dynamo paper Interpretation- Technology Introduction (reprint) twitter system architecture Analysis (read note) Introduction to the architecture of the Youku Architecture learning notes

2017-6-8/Large Web site architecture summary

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.