The evolution of large-scale website system architecture

Last Update:2018-07-24 Source: Internet

Author: User

Tags data structures require requires

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article is for people who want to understand how a large web site is a step-by-step architecture, said, or very good, special reprint, Original: http://www.uml.org.cn/zjjs/201306263.asp

Before I briefly briefed you on the architecture of each well-known large-scale website, the success secrets of hundreds of millions of user sites, the Flickr architecture, the YouTube architecture, the PlentyOfFish website Architecture study, the Wikipedia technical architecture learning notes. These are typical, we can get a lot of knowledge about the structure of the site, after watching you will find that your original idea is likely to be narrow.

Today we are going to talk about a site is generally how to build a system architecture, although we hope that the site can have a very good structure at the beginning, but Marx told us that things are moving forward in the development, the site structure with the expansion of the business, the needs of users continue to improve, Here is the basic process of the gradual development of a website architecture, after reading, please think about, you are now at which stage.

Architecture Evolution First step: Physically separate webserver and databases

At first, because of some ideas, so on the internet to build a website, this time may even host is rented, but because this article we only focus on the evolution of the architecture, so it is assumed that this time is already hosting a host, and a certain amount of bandwidth. This time because the site has a certain characteristics, attracted some people to visit, gradually you find the system pressure is getting higher and slower, and this time is more obvious is the database and application interaction, application problems, the database is also prone to problems, and database problems, the application is also prone to problem. Then entered the first stage of evolution: The application and the database from the physical separation into two machines, this time there is no new technical requirements, but you find that the effect of the system has been restored to the previous response speed, and support higher traffic, and will not be due to the database and application to form a mutual impact.

Look at the diagram of the system after the completion of this step:

Architecture Evolution Step Two: Increase page caching

Not long, with more and more people visiting, you find that the response speed began to slow down, find the reason, the discovery is to access the database too many operations, resulting in fierce competition in data connections, so the response is slow. But the database connection can not open too much, otherwise the database machine pressure will be very high, so consider adopting the caching mechanism to reduce the competition of database connection resources and the pressure to read the database. At this point, you may choose to use similar mechanisms such as squid to cache relatively static pages in the system (for example, a two-day update of the page) for caching (of course, you can also use the static page of the scheme), so that the program can not be modified, will be able to reduce the pressure on the webserver and reduce the competition of database connection resources, OK, then began to use squid to do relatively static cache of the page.

Look at the diagram of the system after the completion of this step:

This step involves these knowledge systems:

Front-end page caching technology, such as squid, if you want to use good words also have to grasp the implementation of squid and cache failure algorithm.

Architecture Evolution Step Three: Increase page fragment caching

Added squid to do the cache, the overall system speed is indeed improved, webserver pressure is also beginning to decline, but with the increase in traffic, the discovery system began to change a bit slower. After tasting the benefits of a dynamic cache such as squid, I started to think about whether the relatively static parts of the dynamic pages would be cached now, so consider using a page fragment caching strategy like ESI, OK, and start using ESI to do the caching of the relatively static fragment portion of the dynamic page.

Look at the diagram of the system after the completion of this step:

This step involves these knowledge systems:

Page fragment caching technology, such as ESI, to use good words also need to master the implementation of ESI, and so on;

Architecture Evolution Step Fourth: Data caching

With the adoption of ESI-like techniques to improve the caching of the system again, the pressure of the system is actually further reduced, but again, as the traffic increases, the system starts to slow down. After looking, it may be found in the system there are some repeated access to data information, such as access to user information, and so on, this time began to consider whether this data can be cached, so that the data cached to local memory, after the change is complete, fully meet the expectations, the system's response speed has been restored, The pressure on the database has also diminished a lot.

Look at the diagram of the system after the completion of this step:

This step involves these knowledge systems:

Caching techniques, including map data structures, caching algorithms, the implementation mechanism of the chosen framework itself.

Architecture Evolution Step Fifth: Increase webserver

Not long, found that with the increase in system access, Webserver machine pressure in the peak will rise to a relatively high, this time began to consider adding a webserver, which is also to solve the availability of the problem, to avoid a single webserver Down machine words can not use, after doing these considerations, decided to add a webserver, add a webserver, will encounter some problems, typical is:

1, how to assign access to the two machines, this time usually consider the plan is Apache's own load balancing scheme, or LVS such a software load balancing scheme;

2, how to maintain the synchronization of state information, such as user session, this time will consider the scheme has written to the database, write storage, cookies or synchronization session information mechanism, etc.

3, how to maintain the synchronization of data cache information, such as previously cached user data, etc., this time usually consider the mechanism of cache synchronization or distributed cache;

4, how to make uploading files these similar functions continue to normal, this time usually consider the mechanism is the use of shared file system or storage, etc.;

After solving these problems, the webserver is finally added to two units, and the system is finally back to the previous speed.

Look at the diagram of the system after the completion of this step:

This step involves these knowledge systems:

Load balancing technology (including but not limited to hardware load balancing, software load balancing, load algorithm, Linux forwarding Protocol, implementation details of selected technology, etc.), Master and standby technology (including but not limited to ARP spoofing, linuxheart-beat, etc.), State information or cache synchronization technology (including but not limited to cookie technology, UDP protocol, status information broadcast, implementation details of the selected cache synchronization technology, etc.), shared file technology (including but not limited to NFS, etc.), storage technology (including but not limited to storage devices, etc.).

Architecture Evolution Sixth Step: sub-Library

Enjoy a period of time the system visits the high-speed growth of happiness, the discovery system began to slow down, this is what the situation, after looking, found that the database write, update some of these operations database connection resource competition is very fierce, causing the system to slow down, how to do it. At this point, the option has a database cluster and sub-library policies, cluster aspects like some database support is not very good, so the sub-Library will become a more common strategy, sub-Library also means to modify the original program, a change to achieve the sub-Library, good, the goal reached, the system recovery even faster than before.

Look at the diagram of the system after the completion of this step:

This step involves these knowledge systems:

This step is more necessary from the business to make a reasonable division, to achieve the sub-Library, the specific technical details of no other requirements, but at the same time with the increase in data volume and sub-database, in the design of databases, tuning and maintenance needs to do better, so the technology in these areas has put forward a very high demand.

Architecture Evolution Step Seventh: Table, Dal, and distributed cache

With the continuous operation of the system, the volume of data began to grow substantially, this time to find the library after the query will still be some slow, so according to the idea of the library began to do the work of the table. Of course, this inevitably will require some changes to the program, perhaps at this time will find the application of their own to care about the rules of the sub-database, or some complex. So the initiation can be added to a common framework for the data access of the sub-database table, which corresponds to the DAL in the architecture of ebay, the evolution of this process takes a relatively long time. Of course, it is also possible that this generic framework will wait until the table is finished before starting to do it. At the same time, there may be problems with the previous cache synchronization scheme, because the amount of data is too large, which makes it less likely to present the cache locally, and then synchronize the way it needs to adopt a distributed cache scheme. So, it is a survey and torture, and finally a large number of data cache transfer to the distributed cache.

Look at the diagram of the system after the completion of this step:

This step involves these knowledge systems:

Sub-table More also is the division of business, the technology involves dynamic hash algorithm, Consistenthash algorithm, etc. the DAL involves more complex techniques such as database connection management (timeouts, exceptions), control of database operations (timeouts, exceptions), encapsulation of sub-list rules, etc. ；

Architecture Evolution Step Eighth: add more webserver

After doing the work of the sub-Library, the pressure on the database has dropped to a relatively low, and began to watch the daily traffic explosion of the happy life. Suddenly one day, found that the system's access and began to slow down the trend, this time first look at the database, the pressure is normal, then look at webserver, found that Apache blocked a lot of requests, and the application server for each request is also relatively fast, it seems that the number of requests is too high to wait for the queue, Slow response times. This is OK, generally speaking, this time will also have some money, so add some webserver server, in this add webserver server process, there may be several challenges:

1. Apache soft load or LVS soft load can not bear the huge amount of web traffic (request connection number, network flow, etc.) scheduling, this time if the funding allows, the plan is to buy hardware load balancing equipment, such as F5, Netsclar, Athelon and so on, If the funds are not allowed, the plan is to make the application logically classified, and then dispersed to different soft load cluster;

2, some of the original state information synchronization, file sharing and other programs may be bottlenecks, need to be improved, perhaps this time will be based on the situation to write to meet the needs of the Web site Distributed file system, etc.

After doing this, we begin to enter an era of seemingly perfect infinity, and when website traffic increases, the solution is to constantly add webserver.

Look at the diagram of the system after the completion of this step:

This step involves these knowledge systems:

This step, as the number of machines growing, the volume of data and the increasing demand for system availability, this time requires a more in-depth understanding of the technology used, and needs to be based on the needs of the site to do more customized products.

Architecture Evolution Step nineth: Data read-write separation and inexpensive storage solutions

Suddenly one day, found that the perfect time to end, the database of the nightmare once again appeared in the eyes. Due to the addition of too many webserver, resulting in the database connection resources is not enough, and this time has been divided into a table, and began to analyze the pressure of the database, you may find the database read and write ratio is very high, this time often think of data read and write separation scheme. Of course, this solution is not easy to implement, in addition, may find some data stored in the database is a bit wasteful, or too occupy the database resources, so at this stage may be formed by the evolution of the architecture is to achieve data read and write separation, while writing some more inexpensive storage solutions, such as bigtable.

Look at the diagram of the system after the completion of this step:

This step involves these knowledge systems:

Data read and write separation requirements of the database replication, standby and other strategies have in-depth grasp and understanding, and will require a self-implemented technology; The inexpensive storage scheme requires in-depth mastery and understanding of the OS's file storage, while requiring in-depth mastery of the language used in the implementation of the file.

Architecture Evolution Step Tenth: Into the era of large-scale distributed applications and inexpensive server group Dream era

After the long and painful process above, finally is again ushered in the perfect era, and constantly increase the webserver can support more and more high traffic. For large sites, the importance of popularity is beyond doubt, as the popularity of the more and more high, a variety of functional requirements also began to explode. This time suddenly found that the original deployment of the Web application on the webserver is very large, when more than one team began to change it, it is quite inconvenient, reusability is pretty bad, basically every team has done more or less duplication of things, and deployment and maintenance is also quite troublesome. Because the huge application package in the N machine to copy, start all need to spend a lot of time, the problem is not very good to check, and another worse situation is likely to be a bug in an application caused by the whole station is not available, there are other like tuning bad operation (because the application deployed on the machine to do everything, There is no way to make targeted tuning) and other factors, based on such analysis, began to make a decision, the system according to the responsibility of the split, so a large distributed application was born, usually, this step takes a long time, because there will be a lot of challenges:

1, split into a distributed after the need to provide a high-performance, stable communication framework, and need to support a variety of different communication and remote Call mode;

2, it takes a long time to split a huge application, need to do business collation and system dependency control, etc.

3, how to operate (rely on management, health management, error tracking, tuning, monitoring and alarm, etc.) good this huge distributed application.

After this step, the architecture of almost the system enters a relatively stable phase, but also can start to use a large number of inexpensive machines to support the huge amount of traffic and data, combined with this architecture and the experience of so many evolutionary processes to adopt a variety of other methods to support the increasing volume of traffic.

Look at the diagram of the system after the completion of this step:

This step involves these knowledge systems:

This step involves a lot of knowledge system, requires a deep understanding and mastery of communication, remote call, message mechanism and so on, the requirements are from the theory, hardware level, operating system level and the implementation of the language used have a clear understanding.

Finally, attach a map of the large web site:

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More