Build large-scale website architectures step by step

Source: Internet
Author: User
Tags database sharding
Previously, I briefly introduced the architectures of various well-known large websites, the five milestones of MySpace, the architecture of Flickr, the architecture of YouTube, the architecture of plentyoffish, and the architecture of Wikipedia. These are all very typical. We can get a lot of knowledge about website architecture. After reading this, you will find that your original ideas may be narrow.

Today, let's talk about how a website builds the system architecture step by step. Although we hope that the website can have a good architecture at the beginning, however, Marx told us that things are constantly evolving and that the website architecture is constantly improved with the expansion of business and user needs. The following is a basic process for the gradual development of the website architecture, after reading it, think about the stage at which you are.

Architecture Evolution Step 1: physically separate webserver and database

At the beginning, due to some ideas, a website was built on the Internet. At this time, even hosts may be rented. However, as this article only focuses on the evolution of the architecture, therefore, it is assumed that a host has been hosted at this time, and a certain amount of bandwidth is available. At this time, because the website has certain characteristics and attracts some people to visit, you gradually find that the system is under increasing pressure and the response speed is getting slower, at this time, it is obvious that the database and application are mutually affected, and the application has problems, and the database is also prone to problems. In the case of database problems, the application is also prone to problems. So we entered the first step of evolution: physically separating applications and databases into two machines. At this time, there were no new technical requirements, however, you find that the results are indeed effective, and the system returns to the previous response speed, and supports higher traffic, and will not affect each other because of databases and applications.

Take a look at the system diagram after this step is completed:

Step 2 of Architecture Evolution: add page Cache

It is not a long time. As more and more people access the database, you find that the response speed starts to slow down again. Looking for the reason, you find that there are too many operations to access the database, resulting in fierce competition for data connections, and the response slows down. However, the database connection cannot be opened too much. Otherwise, the pressure on the database machine will be high. Therefore, we should consider using a cache mechanism to reduce the competition for database connection resources and the pressure on Database reading. At this time, you may first choose to use a similar mechanism such as squid to cache relatively static pages in the system (for example, pages updated in a day or two) (of course, you can also use the static page solution). In this way, you can reduce the pressure on webserver and reduce the competition for database connection resources without modifying the program. OK, so we started to use squid for relatively static page cache.

Take a look at the system diagram after this step is completed:

This step involves these knowledge systems:

Front-end page cache technology, such as squid. If you want to use it well, you must have a thorough understanding of squid implementation methods and cache failure algorithms.

Step 3 of Architecture Evolution: add page fragment Cache

After squid is added for caching, the overall system speed is indeed improved, and the pressure on WebServer is also decreasing. However, as the access volume increases, it is found that the system has started to slow down. After learning about the benefits of Dynamic Caching such as squid, I began to think about how to make the static parts of the dynamic pages cached, therefore, we consider using a page fragment caching policy like ESI, Which is OK, so we began to use ESI to cache relatively static parts of dynamic pages.

Take a look at the system diagram after this step is completed:

This step involves these knowledge systems:

Page fragment caching technology, such as ESI, also needs to master the implementation method of ESI if you want to use it well;

Step 4 of Architecture Evolution: data caching

After using ESI and other technologies to improve the system's cache effect again, the system's pressure is indeed further reduced. However, as the access volume increases, the system continues to slow down. After searching, we can find that there are some places in the system that repeatedly obtain data information, such as getting user information. At this time, we began to consider whether we can cache the data information, as a result, the data is cached to the local memory. After the change, the response speed of the system is restored, and the pressure on the database is reduced.

Take a look at the system diagram after this step is completed:

This step involves these knowledge systems:

Cache Technology, including map data structures, cache algorithms, and implementation mechanisms of the framework itself.

Step 5 of Architecture Evolution: Add Webserver

The good news is not long. It is found that with the increase of System Access traffic, the pressure on webserver machines will rise to a relatively high level during the peak period. At this time, we began to consider adding a webserver to solve the problem of availability at the same time, it is impossible to use a single webserver if it is down. After these considerations, I decided to add a webserver and a webserver, which may encounter some problems. Typical examples include:
1. How to allocate access to these two machines? In this case, we usually consider the Server Load balancer solution that comes with Apache or software Load balancer solutions such as LVS;
2. How to keep the state information synchronized, such as user sessions, and so on. In this case, we will consider mechanisms such as writing data to the database, writing data to storage, Cookie, or synchronizing session information;
3. How to keep the data cache information synchronized, such as previously cached user data, which usually involves cache synchronization or distributed cache;
4. How to ensure that similar functions such as file uploading continue to work normally, the mechanism usually considered is to use shared file systems or storage;
After solving these problems, we finally increased the number of webservers to two, and the system finally recovered to the previous speed.

Take a look at the system diagram after this step is completed:

This step involves these knowledge systems:

Server Load balancer technology (including but not limited to hardware Server Load balancer, software Server Load balancer, load algorithms, Linux forwarding protocols, and implementation details of the selected technology) master/Slave technology (including but not limited to ARP spoofing and linuxheart-beat), status information or cache synchronization technology (including but not limited to Cookie technology, UDP protocol, status information broadcast, the implementation details of the selected cache synchronization technology, etc) shared File technology (including but not limited to NFS) and storage technology (including but not limited to storage devices ).

Step 6 of Architecture Evolution: Database sharding

After enjoying the high-traffic growth of the system for a period of time, I found that the system began to slow down again. What is the situation this time, I found that some of the data warehouse connections for database write and update operations are highly competitive, resulting in system slowdown. What should I do now? In this case, the available solutions include Database Cluster and database sharding policies. In terms of clusters, some databases do not support very well. Therefore, database sharding will become a common strategy, database sharding means that you need to modify the original program. After a single modification is made to achieve database sharding, the goal is achieved, and the system recovery speed is even faster than before.

Take a look at the system diagram after this step is completed:

This step involves these knowledge systems:

This step requires a reasonable division of the business to achieve database sharding. There are no other requirements for specific technical details;

However, with the increase in data volume and database sharding, database design, optimization, and maintenance must be improved. Therefore, high requirements are raised for these technologies.

Architecture Evolution Step 7: Table sharding, Dal, and distributed cache

As the system continues to run, the amount of data began to grow significantly. At this time, it was found that the query was still slow after database sharding, so we started to do table sharding according to the concept of database sharding. Of course, this will inevitably require some modifications to the program. At this time, you may find that the application needs to care about database/table sharding rules and so on, which is still complicated. Therefore, whether a general framework can be added to achieve database/table sharding data access is required. The architecture of eBay corresponds to Dal, which takes a long time. Of course, it is also possible that this general framework will not begin until the sub-tables are completed. At the same time, problems may occur in the previous cache synchronization scheme at this stage. Because the data volume is too large, it is unlikely that the cache will be stored locally, distributed cache is required. As a result, it was an investigation and torture, and finally transferred a large amount of data cache to the distributed cache.

Take a look at the system diagram after this step is completed:

This step involves these knowledge systems:

Table sharding is also a business division. Technically, it involves dynamic hash algorithms and consistenthash algorithms;

Dal involves many complex technologies, such as database connection management (timeout and exception), Database Operation Control (timeout and exception), and database/table sharding rule encapsulation;

Step 8 of Architecture Evolution: add more webservers

After database and table sharding, the pressure on the database has been reduced to a relatively low level, and we have begun to live a happy life of daily traffic surge. Suddenly one day, we found that the access to the system began to slow down. At this time, we first checked the database and the pressure was normal. Then we checked the webserver and found that Apache blocked a lot of requests, the application server is also relatively fast for each request. It seems that the number of requests is too high, resulting in waiting in queue and slow response. This is fine. In general, there will be some money at this time, so some webserver servers will be added. In this process of adding webserver servers, there may be several challenges:

1. Apache's soft load or LVS soft load cannot handle the scheduling of huge web traffic (number of request connections, network traffic, etc.). If funds permit this, the solution is to purchase hardware Load Balancing devices, such as F5, netsclar, and athelon. If funds are not allowed, the solution is to logically classify applications and distribute them to different soft-load clusters;

2. Some original status information synchronization and file sharing solutions may encounter bottlenecks and need to be improved. At this time, a distributed file system meeting the website business needs may be compiled as appropriate;

After completing this work, we began to enter an era of seemingly perfect unlimited scaling. When the website traffic increases, the solution is to constantly add webservers.

Take a look at the system diagram after this step is completed:

This step involves these knowledge systems:

At this point, as the number of machines continues to grow, the amount of data continues to grow, and the requirements for system availability are getting higher and higher, we need to have a deeper understanding of the technology we are using, we also need to make more customized products based on the needs of the website.

Step 9 of Architecture Evolution: data read/write splitting and low-cost storage solutions

Suddenly one day, I found that this perfect age is coming to an end, and the database's nightmare is coming soon. Because too many webservers are added, the database connection resources are insufficient. At this time, the database has been sharded and table-based, and the pressure on the database has been analyzed, it may be found that the read/write ratio of the database is very high. In this case, we usually think of the data read/write splitting solution. Of course, this solution is not easy to implement. In addition, it may find that some data is stored in the database as a waste, or it occupies too much database resources, therefore, the architecture that may be formed at this stage is to implement data read/write splitting and write cheaper storage solutions, such as bigtable.

Take a look at the system diagram after this step is completed:

This step involves these knowledge systems:

Data read/write splitting requires an in-depth understanding of database replication, standby, and other strategies, and requires self-implemented technologies;

The low-cost storage solution requires a deep understanding and understanding of OS file storage, and a deep understanding of the implementation of the language used in the file.

Step 10 of Architecture Evolution: entering the era of large-scale distributed applications and the dream age of cheap SERVER CLUSTERS

After the long and painful process above, we finally ushered in the perfect era again. The increasing number of webservers can support the increasing access volume. For large websites, there is no doubt that the popularity is important. As the popularity increases, various functional requirements also surge. At this time, it was suddenly found that the Web application originally deployed on the webserver was already very large. When multiple teams began to change it, it was quite inconvenient, the reusability is also quite bad, basically because every team has done more or less repetitive tasks, and deployment and maintenance are also quite troublesome. Because it takes a lot of time to copy and start a large application package on N machines, it is not very easy to check when there is a problem, another worse situation is that there may be bugs in an application, which leads to unavailability of the entire site, there are other factors such as poor optimization (because the application deployed on the machine has to do everything, and no targeted optimization can be performed at all). Based on such analysis, I began to make up my mind, split the system according to their responsibilities, so a large distributed application was born. Generally, this step takes a long time because it will encounter many challenges:

1. A high-performance and stable communication framework should be provided after the distributed architecture is split, and different communication and remote call methods should be supported;
2. Splitting a large application takes a long time and requires business organization and system dependency control;
3. How to perform O & M (dependency management, operation status management, Error Tracking, tuning, monitoring, and alarms) for this large distributed application.
After this step, the architecture of similar systems has entered a relatively stable stage, and a large number of cheap machines can also be used to support huge volumes of traffic and data, using this architecture and the experience gained from so many evolutionary processes, we can use various other methods to support increasing access volumes.

Take a look at the system diagram after this step is completed:

This step involves these knowledge systems:

This step involves a lot of knowledge systems and requires an in-depth understanding and understanding of communications, remote calls, messaging mechanisms, etc, all requirements are clearly understood in terms of theory, hardware, operating system, and language.

Finally, the structural diagram of a large website is attached:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.