Turn: Evolution of large-scale website architecture and Knowledge System

Source: Internet
Author: User
Tags database sharding

Http://developer.51cto.com/art/200810/91460.htm

There have been some articles about the evolution of large website architectures, such as livejournal and eBay, which are very worthy of reference. However, they feel more about the results of each evolution, without a detailed explanation of why such an evolution is required, coupled with the recent feeling that many students are hard to understand why a website requires such complex technologies, with the idea of writing this article, this article will describe a typical Architecture Evolution Process and a knowledge system to be mastered in the process of developing a common website into a large website, I hope you can give some preliminary ideas to those who want to engage in the Internet industry. Please give me more suggestions for the mistakes in this article, so that this Article can truly serve as an example.

Architecture Evolution Step 1: physically separate webserver and database

At the beginning, due to some ideas, a website was built on the Internet. At this time, even hosts may be rented. However, as this article only focuses on the evolution of the architecture, therefore, it is assumed that a host has been hosted at this time, and there is a certain amount of bandwidth. At this time, because the website has certain characteristics, it attracts some people to access it, gradually, you find that the system is under increasing pressure and the response speed is getting slower and slower. What is obvious at this time is that the database and application are mutually affected, and the application has problems, and the database is also prone to problems, when there is a problem with the database, the application is prone to problems, so it enters the first stage of evolution: separating the application from the database physically and changing it into two machines, there are no new technical requirements at this time, but you find that the results are indeed effective, and the system has recovered to the previous response speed and supported higher traffic, it does not affect each other because of databases and applications.

Take a look at the system diagram after this step is completed:

 

This step involves these knowledge systems:

The evolution of this architecture is basically not required by the technical knowledge system.

Step 2 of Architecture Evolution: add page Cache

The good news is not long. As more and more people access the database, you find that the response speed starts to slow down again. Looking for the reason, you find that there are too many operations to access the database, which leads to fierce competition in data connection, resulting in slow response, however, the database cannot be connected too much. Otherwise, the pressure on the database machine will be high. Therefore, we should consider using a cache mechanism to reduce the competition for database connection resources and the pressure on Database reading, at this time, you may first choose to use a similar mechanism such as squid to cache relatively static pages in the system (for example, pages updated in a day or two) (of course, you can also use the static page solution). In this way, you can reduce the pressure on webserver and reduce the competition for database connection resources without modifying the program. OK, so we started to use squid for relatively static page cache.

Take a look at the system diagram after this step is completed:

 

This step involves these knowledge systems:

Front-end page cache technology, such as squid. If you want to use it well, you must have a thorough understanding of squid implementation methods and cache failure algorithms.

Step 3 of Architecture Evolution: add page fragment Cache

After squid is added for caching, the overall system speed is indeed improved, and the pressure on WebServer is also decreasing. However, as the access volume increases, it is found that the system has started to slow down, after learning about the benefits of Dynamic Caching such as squid, I began to think about how to make the static parts of the dynamic pages cached, therefore, we consider using a page fragment caching policy like ESI, Which is OK, so we began to use ESI to cache relatively static parts of dynamic pages.

Take a look at the system diagram after this step is completed:

 

This step involves these knowledge systems:

Page fragment caching technology, such as ESI, also needs to master the implementation method of ESI if you want to use it well;

Step 4 of Architecture Evolution: data caching

After using ESI and other technologies to improve the system's cache effect again, the system pressure is indeed further reduced. However, as the access volume increases, the system continues to slow down and goes through searching, you may find that there are some places in the system that repeatedly obtain data information, such as getting user information. At this time, consider whether you can cache the data information, therefore, the data is cached to the local memory. After the change, the response speed of the system is restored and the pressure on the database is reduced.

Take a look at the system diagram after this step is completed:

 

This step involves these knowledge systems:

Cache Technology, including map data structures, cache algorithms, and implementation mechanisms of the framework itself.

Step 5 of Architecture Evolution: Add Webserver

The good news is not long. It is found that with the increase of System Access traffic, the pressure on webserver machines will rise to a relatively high level during the peak period. At this time, we began to consider adding a webserver to solve the problem of availability at the same time, it is impossible to use a single webserver if it is down. After these considerations, I decided to add a webserver and a webserver, which may encounter some problems. Typical examples include:
1. How to allocate access to these two machines? In this case, we usually consider the Server Load balancer solution that comes with Apache or software Load balancer solutions such as LVS;
2. How to keep the state information synchronized, such as user sessions, and so on. In this case, we will consider mechanisms such as writing data to the database, writing data to storage, Cookie, or synchronizing session information;
3. How to keep the data cache information synchronized, such as previously cached user data, which usually involves cache synchronization or distributed cache;
4. How to ensure that similar functions such as file uploading continue to work normally, the mechanism usually considered is to use shared file systems or storage;
After solving these problems, we finally increased the number of webservers to two, and the system finally recovered to the previous speed.

Take a look at the system diagram after this step is completed:

 

This step involves these knowledge systems:

Server Load balancer technology (including but not limited to hardware Server Load balancer, software Server Load balancer, load algorithms, Linux forwarding protocols, and implementation details of the selected technology) master-slave technology (including but not limited to ARP spoofing and Linux heart-beat), status information or cache synchronization technology (including but not limited to Cookie technology, UDP protocol, status information broadcast, the implementation details of the selected cache synchronization technology, etc) shared File technology (including but not limited to NFS) and storage technology (including but not limited to storage devices ).

Step 6 of Architecture Evolution: Database sharding

After enjoying the high-traffic growth of the system for a period of time, I found that the system began to slow down again. What is the situation this time, it is found that the competition for some database connection resources for database write and update operations is fierce, leading to system slowdown. What should we do now? The available solutions include database clusters and database sharding policies, in terms of clusters, some databases do not support very well. Therefore, database sharding will become a common policy. Database sharding means that you need to modify the original program and implement database sharding through one-pass modification, the target is reached, and the system recovery speed is even faster than before.

Take a look at the system diagram after this step is completed:

 

This step involves these knowledge systems:

This step requires a reasonable division of the business to achieve database sharding. There are no other requirements for specific technical details;

However, with the increase in data volume and database sharding, database design, optimization, and maintenance must be improved. Therefore, high requirements are raised for these technologies.

Architecture Evolution Step 7: Table sharding, Dal, and distributed cache

As the system continues to run, the amount of data began to increase significantly. At this time, it was found that the query was still slow after database sharding, so the table sharding work started according to the concept of database sharding. Of course, this will inevitably require some modifications to the program. At this time, you may find that the application needs to care about database/table sharding rules and so on, which is still complicated, therefore, whether a general framework can be added to achieve database/table sharding data access is required. This architecture corresponds to the Dal, which takes a long time to evolve, of course, it is also possible that this general framework will not be started until the sub-tables are completed. At the same time, at this stage, you can find problems with the previous cache synchronization solution, because the data volume is too large, as a result, it is unlikely that the cache will be stored locally, and then the synchronous method requires a distributed cache scheme. Therefore, it is an investigation and torture, finally, a large amount of data cache is transferred to the distributed cache.

Take a look at the system diagram after this step is completed:

 

This step involves these knowledge systems:

Table shards are also business partitions. Technically, dynamic hash algorithms and consistent hash algorithms are involved;

Dal involves many complex technologies, such as database connection management (timeout and exception), Database Operation Control (timeout and exception), and database/table sharding rule encapsulation;

Step 8 of Architecture Evolution: add more webservers

After database and table sharding, the pressure on the database has been reduced to a relatively low level. Then, I began to live a happy life of daily traffic surge. suddenly one day, we found that the system access started to slow down again. At this time, we first checked the database and the pressure was normal. Then we checked the webserver and found that Apache blocked a lot of requests, the application server is also relatively fast for each request. It seems that the number of requests is too high, leading to the need to wait in queue and slow response. This is a good solution. In general, there will be some money at this time, as a result, some webserver servers are added. In this process of adding webserver servers, there may be several challenges:
1. Apache's soft load or LVS soft load cannot handle the scheduling of huge web traffic (number of request connections, network traffic, etc.). If funds permit this, the solution is to purchase hardware loads, such as F5, netsclar, and athelon. If funds are not allowed, the solution is to logically classify applications, then distributed to different soft load clusters;
2. Some original status information synchronization and file sharing solutions may encounter bottlenecks and need to be improved. At this time, a distributed file system meeting the website business needs may be compiled as appropriate;
After completing this work, we began to enter an era of seemingly perfect unlimited scaling. When the website traffic increases, the solution is to constantly add webservers.

Take a look at the system diagram after this step is completed:

 

This step involves these knowledge systems:

At this point, as the number of machines continues to grow, the amount of data continues to grow, and the requirements for system availability are getting higher and higher, we need to have a deeper understanding of the technology we are using, we also need to make more customized products based on the needs of the website.

Step 9 of Architecture Evolution: data read/write splitting and low-cost storage solutions

Suddenly one day, I found that this perfect era is coming to an end, and the database's nightmare is coming soon again. Because too many webservers are added, the database connection resources are still insufficient, at this time, we have already split databases and tables. When we begin to analyze the database pressure, we may find that the database read/write ratio is very high. At this time, we usually think of the data read/write splitting solution. Of course, this solution is not easy to implement. In addition, some data may be stored in the database, which is a waste or occupies too much database resources, therefore, the architecture that may be formed at this stage is to implement data read/write splitting and write cheaper storage solutions, such as bigtable.

Take a look at the system diagram after this step is completed:

 

 

This step involves these knowledge systems:

Data read/write splitting requires an in-depth understanding of database replication, standby, and other strategies, and requires self-implemented technologies;

The low-cost storage solution requires a deep understanding and understanding of OS file storage, and a deep understanding of the implementation of the language used in the file.

Step 10 of Architecture Evolution: entering the era of large-scale distributed applications and the dream age of cheap SERVER CLUSTERS

After the long and painful process above, we finally ushered in the perfect era again. The increasing number of webservers can support increasing access volumes. For large websites, there is no doubt about the importance of popularity. As the popularity increases, various functional requirements also surge. At this time, we suddenly discovered that, the Web application originally deployed on the webserver is already very large. When multiple teams begin to modify it, it is really inconvenient, and the reusability is also quite bad, basically, every team has done more or less repetitive tasks, and deployment and maintenance are also quite troublesome, because it takes a lot of time to copy and start a large application package on N machines, it is not very easy to check when there is a problem, another worse situation is that there may be bugs in an application, which leads to unavailability of the entire site, there are other factors such as poor optimization (because the application deployed on the machine has to do everything, and no targeted optimization can be performed at all). Based on such analysis, I began to make up my mind, split the system according to their responsibilities, so a large distributed application was born. Generally, this step takes a long time because it will encounter many challenges:
1. A high-performance and stable communication framework should be provided after the distributed architecture is split, and different communication and remote call methods should be supported;
2. Splitting a large application takes a long time and requires business organization and system dependency control;
3. How to perform O & M (dependency management, operation status management, Error Tracking, tuning, monitoring, and alarms) for this large distributed application.
After this step, the architecture of similar systems has entered a relatively stable stage, and a large number of cheap machines can also be used to support huge volumes of traffic and data, using this architecture and the experience gained from so many evolutionary processes, we can use various other methods to support increasing access volumes.

Take a look at the system diagram after this step is completed:

 

This step involves these knowledge systems:

This step involves a lot of knowledge systems and requires an in-depth understanding and understanding of communications, remote calls, messaging mechanisms, etc, all requirements are clearly understood in terms of theory, hardware, operating system, and language.

O & M involves many knowledge systems. In most cases, you need to master distributed parallel computing, reports, monitoring technology, and rule policies.

It is really not very laborious to say. The classic evolution process of the entire website architecture is similar to the above. Of course, the steps for evolution in each step may be different. In addition, due to the different services of the website, there will be different technical requirements. This blog is more about explaining the evolution process from the perspective of architecture. Of course, many other technologies are not mentioned here, such as database clusters, data mining, and search, however, in the real evolution process, we will also use images like hardware upgrades, network environments, OS upgrades, and CDN images to support larger traffic volumes, therefore, there will be many differences in the real development process. What another large website needs to do is not only the above, but also security, O & M, operation, service, and storage, it is really not easy to build a large website. I write this article to introduce more about the evolution of the large website architecture.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.