Build large-scale website architectures step by step

Source: Internet
Author: User
Tags database sharding
Build large-scale website architectures step by step

Original article: http://www.itivy.com/ivy/archive/2011/4/28/634395931511515337.html

Previously, I briefly introduced the architectures of various well-known large websites, the five milestones of MySpace, the architecture of Flickr, the architecture of YouTube, the architecture of plentyoffish, and the architecture of Wikipedia. These are all very typical. We can get a lot of knowledge about website architecture. After reading this, you will find that your original ideas may be narrow.

Today, let's talk about how a website builds the system architecture step by step. Although we hope that the website can have a good architecture at the beginning, however, Marx told us that things are constantly evolving, and the website architecture is constantly improved with the expansion of business and user needs. The following is a basic process for the gradual development of the website architecture, after reading it, think about the stage at which you are.

Architecture Evolution Step 1: physically separate webserver and database

In the beginning, due to some ideas, a website was built on the Internet. At this time, the host may even be rented.ArticleWe only pay attention to the evolution of the architecture, so we assume that we have hosted a host and a certain amount of bandwidth at this time, because the website has certain characteristics, attracting access from some people, you gradually find that the system is under increasing pressure and the response speed is getting slower and slower. At this time, it is obvious that the database and application affect each other and the application has a problem, databases are also prone to problems, and applications are prone to problems when databases have problems, so they enter the first stage of evolution: physically separating applications from databases, it became two machines. There were no new technical requirements at this time, but you found that the results were indeed effective, and the system recovered to the previous response speed, it also supports higher traffic and does not affect each other because of databases and applications.

Take a look at the system diagram after this step is completed:

Step 2 of Architecture Evolution: add page Cache

The good news is not long. As more and more people access the database, you find that the response speed starts to slow down again. Looking for the reason, you find that there are too many operations to access the database, which leads to fierce competition in data connection, resulting in slow response, however, the database connection cannot be opened too much. Otherwise, the pressure on the database machine will be high. Therefore, we should consider using a cache mechanism to reduce the competition for database connection resources and the pressure on Database reading, at this time, you may first choose to use a similar mechanism such as squid to cache relatively static pages in the system (for example, pages updated in a day or two) (of course, you can also use the static page solution ).ProgramYou can reduce the pressure on the webserver and reduce the competition for database connection resources without modifying it. OK, so you began to use squid for relatively static page cache.

Take a look at the system diagram after this step is completed:

This step involves these knowledge systems:

Front-end page cache technology, such as squid. If you want to use it well, you must have a thorough understanding of squid implementation and cache failure.Algorithm.

Step 3 of Architecture Evolution: add page fragment Cache

After squid is added for caching, the overall system speed is indeed improved, and the pressure on WebServer is also decreasing. However, as the access volume increases, it is found that the system has started to slow down, after learning about the benefits of Dynamic Caching such as squid, I began to think about how to make the static parts of the dynamic pages cached, therefore, we consider using a page fragment caching policy like ESI, Which is OK, so we began to use ESI to cache relatively static parts of dynamic pages.

Take a look at the system diagram after this step is completed:

This step involves these knowledge systems:

Page fragment caching technology, such as ESI, also needs to master the implementation method of ESI if you want to use it well;

Step 4 of Architecture Evolution: data caching

After using ESI and other technologies to improve the system's cache effect again, the system pressure is indeed further reduced. However, as the access volume increases, the system continues to slow down and goes through searching, you may find that there are some places in the system that repeatedly obtain data information, such as getting user information. At this time, you may consider whether you can cache the data information as well, therefore, the data is cached to the local memory. After the change, the response speed of the system is restored and the pressure on the database is reduced.

Take a look at the system diagram after this step is completed:

This step involves these knowledge systems:

Cache Technology, including map data structures, cache algorithms, and implementation mechanisms of the framework itself.

Step 5 of Architecture Evolution: Add Webserver

The good news is not long. It is found that with the increase of System Access traffic, the pressure on webserver machines will rise to a relatively high level during the peak period. At this time, we began to consider adding a webserver to solve the problem of availability at the same time, it is impossible to use a single webserver if it is down. After these considerations, I decided to add a webserver and a webserver, which may encounter some problems. Typical examples include: 1. How to allocate access to these two machines? In this case, we usually consider the Server Load balancer solution that comes with Apache or software Load balancer solutions such as LVS; 2. How to keep the state information synchronized, such as user sessions, and so on. In this case, we will consider mechanisms such as writing data to the database, writing data to storage, Cookie, or synchronizing session information; 3. How to keep the data cache information synchronized, such as previously cached user data, which usually involves cache synchronization or distributed cache; 4. How to ensure that similar functions such as file upload can continue to work normally? In this case, we usually consider using the shared file system or After solving these problems, we finally increased the number of webservers to two, and the system finally recovered to the previous speed.

Take a look at the system diagram after this step is completed:

This step involves these knowledge systems:

Server Load balancer technology (including but not limited to hardware Server Load balancer, software Server Load balancer, load algorithms, Linux forwarding protocols, and implementation details of the selected technology) master/Slave technology (including but not limited to ARP spoofing and linuxheart-beat), status information or cache synchronization technology (including but not limited to Cookie technology, UDP protocol, status information broadcast, the implementation details of the selected cache synchronization technology, etc) shared File technology (including but not limited to NFS) and storage technology (including but not limited to storage devices ).

Step 6 of Architecture Evolution: Database sharding

After enjoying the high-traffic growth of the system for a period of time, I found that the system began to slow down again. What is the situation this time, it is found that the competition for some database connection resources for database write and update operations is fierce, leading to system slowdown. What should we do now? The available solutions include database clusters and database sharding policies, in terms of clusters, some databases do not support very well. Therefore, database sharding will become a common policy. Database sharding means that you need to modify the original program and implement database sharding through one-pass modification, the target is reached, and the system recovery speed is even faster than before.

Take a look at the system diagram after this step is completed:

This step involves these knowledge systems:

This step requires a reasonable division of the business to achieve database sharding. There are no other requirements for specific technical details;

However, with the increase in data volume and database sharding, database design, optimization, and maintenance must be improved. Therefore, high requirements are raised for these technologies.

Architecture Evolution Step 7: Table sharding, Dal, and distributed cacheAs the system continues to run, the amount of data began to increase significantly. At this time, it was found that the query was still slow after database sharding, so the table sharding work started according to the concept of database sharding. Of course, this will inevitably require some modifications to the program. At this time, you may find that the application needs to care about database/table sharding rules and so on, which is still complicated, therefore, whether a general framework can be added to achieve database/table sharding data access is required. This architecture corresponds to the Dal, which takes a long time to evolve, of course, it is also possible that this general framework will not be started until the sub-tables are completed. At the same time, at this stage, it may be found that there is a problem with the previous cache synchronization scheme, because the data volume is too large, as a result, it is unlikely that the cache will be stored locally, and then the synchronous method requires a distributed cache scheme. Therefore, it is an investigation and torture, finally, a large amount of data cache is transferred to the distributed cache.

Take a look at the system diagram after this step is completed:

This step involves these knowledge systems:

Table sharding is also a business division. Technically, it involves dynamic hash algorithms and consistenthash algorithms;

Dal involves many complex technologies, such as database connection management (timeout and exception), Database Operation Control (timeout and exception), and database/table sharding rule encapsulation;

Step 8 of Architecture Evolution: add more webservers

After database and table sharding, the pressure on the database has been reduced to a relatively low level. Then, I began to live a happy life of daily traffic surge. suddenly one day, we found that the system access started to slow down again. At this time, we first checked the database and the pressure was normal. Then we checked the webserver and found that Apache blocked a lot of requests, the application server is also relatively fast for each request. It seems that the number of requests is too high, leading to the need to wait in queue and slow response. This is a good solution. In general, there will be some money at this time, as a result, some webserver servers are added. In this process of adding webserver servers, there may be several challenges: 1. Apache's soft load or LVS soft load cannot handle the scheduling of huge web traffic (number of request connections, network traffic, etc.). If funds permit this, the solution is to purchase hardware loads, such as F5, netsclar, and athelon. If funds are not allowed, the solution is to logically classify applications, then distributed to different soft load clusters; 2. Some original status information is synchronized File Sharing and other solutions may encounter bottlenecks and need to be improved. In this case, you may write distributed file systems that meet your website business needs, beginning to enter an era of seemingly perfect unlimited scaling, when the website traffic increases, the solution is to constantly add webservers.

Take a look at the system diagram after this step is completed:

This step involves these knowledge systems:

At this point, as the number of machines continues to grow, the amount of data continues to grow, and the requirements for system availability are getting higher and higher, we need to have a deeper understanding of the technology we are using, we also need to make more customized products based on the needs of the website.

Step 9 of Architecture Evolution: data read/write splitting and low-cost storage solutions

Suddenly one day, I found that this perfect era is coming to an end, and the database's nightmare is coming soon again. Because too many webservers are added, the database connection resources are still insufficient, at this time, we have already split databases and tables. When we begin to analyze the database pressure, we may find that the database read/write ratio is very high. At this time, we usually think of the data read/write splitting solution. Of course, this solution is not easy to implement. In addition, some data may be stored in the database, which is a waste or occupies too much database resources, therefore, the architecture that may be formed at this stage is to implement data read/write splitting and write cheaper storage solutions, such as bigtable.

Take a look at the system diagram after this step is completed:

This step involves these knowledge systems:

Data read/write splitting requires an in-depth understanding of database replication, standby, and other strategies, and requires self-implemented technologies;

The low-cost storage solution requires a deep understanding and understanding of OS file storage, and a deep understanding of the implementation of the language used in the file.

Step 10 of Architecture Evolution: entering the era of large-scale distributed applications and the dream age of cheap SERVER CLUSTERS

After the long and painful process above, we finally ushered in the perfect age again. The increasing number of webservers can support the increasing access volume. For large websites, the popularity is beyond doubt, with the increasing popularity, various functional requirements began to surge. At this time, we suddenly found that the Web application originally deployed on the webserver was already very large, when multiple teams begin to make changes to them, it is really inconvenient and the reusability is also quite bad. Basically, every team has done more or less repetitive tasks, in addition, deployment and maintenance are quite troublesome, because it takes a lot of time to copy and start a large application package on N machines, and it is not very easy to check when there is a problem, another worse situation is that it is likely that a bug in an application will make the entire site unavailable, there are other factors such as poor optimization (because the application deployed on the machine has to do everything, and no targeted optimization can be performed at all). Based on such analysis, I began to make up my mind, split the system according to their responsibilities, so a large distributed application was born. Generally, this step takes a considerable amount of time. For a long time, there will be many challenges: 1. A high-performance and stable communication framework should be provided after the distributed architecture is split, and different communication and remote call methods should be supported; 2. Splitting a large application takes a long time and requires business organization and system dependency control; 3. How to perform O & M (dependency management, operation status management, Error Tracking, tuning, monitoring, and alarms) for this large distributed application. After this step, the architecture of similar systems has entered a relatively stable stage, and a large number of cheap machines can also be used to support huge volumes of traffic and data, using this architecture and the experience gained from so many evolutionary processes, we can use various other methods to support increasing access volumes.

Take a look at the system diagram after this step is completed:

This step involves these knowledge systems:

This step involves a lot of knowledge systems and requires an in-depth understanding and understanding of communications, remote calls, messaging mechanisms, etc, all requirements are clearly understood in terms of theory, hardware, operating system, and language.

Attached to a large Website:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.