Large-scale website architecture evolution and Knowledge system

Source: Internet
Author: User

Excerpt from: http://kb.cnblogs.com/page/207824/

There have been some articles about the evolution of large-scale web sites, such as LiveJournal and ebay, which are well worth referring to, but feel that they are talking more about the results of each evolution than on why they need to be evolved, Coupled with the recent feeling that a lot of students are difficult to understand why a website needs so complex technology, so there is the idea of writing this article, in this article will explain a common website developed into a large web site in the process of a more typical architecture evolution and need to master the knowledge system, Hope to be engaged in the Internet industry students a little preliminary concept, the text of the wrong place also ask you to give a little more advice, so this article really play the effect.

  Architecture Evolution First step: Physically separate webserver and databases

At first, because of some ideas, so on the internet to build a website, this time may even host is rented, but because this article we only focus on the evolution of the architecture, so it is assumed that this time is already hosting a host, and there is a certain bandwidth, this time due to the site has a certain characteristics, Attracted some people to visit, gradually you find the system pressure is getting higher and slower, and this time is more obvious is the database and application interaction, application problems, database is also prone to problems, and database problems, the application is also prone to problem, Then entered the first stage of evolution: The application and the database from the physical separation into two machines, this time there is no new technical requirements, but you find that the effect of the system has been restored to the previous response speed, and support higher traffic, and will not be due to the database and application to form a mutual impact.

Look at the diagram of the system after the completion of this step:

  

  This step involves these knowledge systems: This step of architecture evolution has no basic requirement on the technical knowledge system.

  Architecture Evolution Step Two: Increase page caching

Not long, with more and more people visiting, you find that the response speed and began to slow down, find the reason, found to access the database too many operations, resulting in fierce competition in data connection, so the response is slow, but the database connection can not open too much, or the database machine pressure will be very high, So consider using a caching mechanism to reduce the competition of database connection resources and the pressure of database reading, this time may choose to use squid and other similar mechanisms to the system in a relatively static page (for example, a two-day update of the page) cache (of course, can also be used to static pages of the scheme), This procedure can not be modified, it can be very good to reduce the pressure on webserver and reduce the competition of database connection resources. OK, then began to use squid to do a relatively static cache of the page.

Look at the diagram of the system after the completion of this step:

  

  This step involves these knowledge systems: Front-end page caching technology, such as squid, if you want to use good words also have to grasp the implementation of squid and cache failure algorithm.

  Architecture Evolution Step Three: Increase page fragment caching

Added squid to do the cache, the overall system speed is indeed improved, the pressure of the webserver began to decline, but with the increase in traffic, the discovery system began to change a little slower, in the taste of squid and other dynamic cache brought benefits, Starting to think about whether the relatively static parts of the dynamic pages are now cached, so consider using a page fragment caching strategy like ESI. OK, so we started using ESI to do the cache of the relatively static fragment portion of the dynamic page.

Look at the diagram of the system after the completion of this step:

  

  This step involves these knowledge systems: page Fragment caching technology, such as ESI, and so on, want to use good words also need to master the implementation of ESI;

Architecture Evolution Step Fourth: Data caching

In the adoption of technology such as ESI once again improve the system's cache effect, the system pressure is really further reduced, but again, with the increase in traffic, the system will start to slow down, after looking, you may find that there are some duplication of information in the system, such as access to user information, This time began to consider whether this data can be cached, so that the data cache to local memory, after the change is complete, fully meet the expectations, the system's response speed has been restored, the database pressure has been reduced a lot.

Look at the diagram of the system after the completion of this step:

  

  This step involves these knowledge systems: caching techniques, including map data structures, caching algorithms, the implementation mechanism of the chosen framework itself.

  Architecture Evolution Step Fifth: Increase webserver

Not long, found that with the increase in system access, Webserver machine pressure in the peak will rise to a relatively high, this time began to consider adding a webserver, which is also to solve the availability of the problem, to avoid a single webserver Down machine words can not use, after doing these considerations, decided to add a webserver, add a webserver, will encounter some problems, typical is: 1, how to make access to the two machines, The scenarios that are typically considered at this time are Apache's own load-balancing scheme, or a software load-balancing scheme such as LVS. 2, how to maintain the synchronization of state information, such as user session, this time will consider the scheme has written to the database, write storage, cookies or synchronization session information mechanisms. 3, how to keep the data cache information synchronization, such as previously cached user data, etc., this time usually consider the mechanism of cache synchronization or distributed cache. 4, how to make uploading files these similar functions continue normal, this time usually consider the mechanism is to use shared file system or storage and so on. After solving these problems, the webserver is finally added to two units, and the system is finally back to the previous speed.

Look at the diagram of the system after the completion of this step:

  

  This step involves these knowledge systems: load balancing techniques (including but not limited to hardware load balancing, software load balancing, load algorithms, Linux forwarding protocols, implementation details of selected technologies, etc.), Master and standby technologies (including but not limited to ARP spoofing, Linux Heart-beat, etc.), state information or cache synchronization technology (including but not limited to cookie technology, UDP protocol, status information broadcast, implementation details of the selected cache synchronization technology, etc.), shared file technology (including but not limited to NFS, etc.), storage technology (including but not limited to storage devices, etc.).

  Architecture Evolution Sixth Step: sub-Library

Enjoy a period of time the system visits the high-speed growth of happiness, the discovery system began to slow down, this is what the situation, after looking, found that the database write, update some of these operations database connection resource competition is very fierce, causing the system to slow down, how to do it, At this point, the option has a database cluster and sub-library policies, cluster aspects like some database support is not very good, so the sub-Library will become a more common strategy, sub-Library also means to modify the original program, a change to achieve the sub-Library, good, the goal reached, the system recovery even faster than before.

Look at the diagram of the system after the completion of this step:

  

This step involves these knowledge systems: This step is more about the need to make a reasonable division from the business to achieve the sub-Library, the specific technical details of no other requirements, but at the same time with the increase in data volume and sub-Library, in the database design, tuning and maintenance needs to do better, Therefore, the technology in these areas has put forward a very high demand.

Architecture Evolution Step Seventh: Table, Dal, and distributed cache

With the continuous operation of the system, the volume of data began to grow substantially, this time to find the library after the query will still be a little slow, so according to the idea of sub-Library began to do the work of the sub-table, of course, this will inevitably need to make some changes to the program, perhaps at this time will be found to apply their own to is still somewhat complex, so the initiation can be added to a common framework to achieve the data access of the sub-database, which corresponds to the DAL in the architecture of ebay, the evolution of the process will take a relatively long time, of course, it is possible that the general framework will wait until the table is done before the start, while the At this stage, you may find that the previous cache synchronization scheme problems, because the volume of data is too large, it is not likely to present the cache locally, and then synchronize the way, the need to adopt a distributed cache scheme, so, is a survey and torture, and finally the large number of data cache transferred to the distributed cache.

Look at the diagram of the system after the completion of this step:

  

  This step involves these knowledge systems: The Sub-table is also more business division, the technology involves dynamic hash algorithm, consistent hash algorithm, etc. the DAL involves more complex techniques such as database connection management (timeouts, exceptions), Control of database operations (timeouts, exceptions), encapsulation of sub-list rules, and so on.

  Architecture Evolution Step Eighth: add more webserver

After doing the work of the sub-Library, the pressure on the database has dropped to a relatively low, and began to watch the daily traffic surge of happy life, suddenly one day, found that the system's visit began to slow trend, this time first to view the database, pressure all normal, then view webserver, found that Apache blocked a lot of requests, and the application server for each request is also relatively fast, it seems that the number of requests is too high caused the need to queue, slow response, this is OK, generally speaking, this time will be some money, so add some webserver server, In this Add webserver server process, there may be several challenges: 1, Apache soft load or LVS soft load can not afford a huge amount of web traffic (request connection number, network flow, etc.) of the dispatch, this time if the funding allows, the plan is to buy hardware load, such as F5, Netsclar, Athelon and the like, if the funds are not allowed, the plan is to apply the logic of a certain classification, and then dispersed to different soft load cluster; 2, some of the existing state information synchronization, file sharing and other scenarios may be bottlenecks, need to be improved, Perhaps this time will be written according to the requirements of the Web site business needs of the distributed file system, etc. after these work, began to enter a seemingly perfect era of unlimited expansion, when the site traffic increases, the solution is to constantly add webserver.

Look at the diagram of the system after the completion of this step:

  

This step involves these knowledge systems: at this point, as the number of machines continues to grow, data volumes continue to grow, and requirements for system availability are increasing, this time requires a deeper understanding of the technologies used and the need for more customized products based on the needs of the site.

Large-scale website architecture evolution and Knowledge system

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.