One of the experiences of large-scale Internet website architecture: Points

Source: Internet
Author: User
Tags database load balancing

We know that scalability is very important for a large website. To achieve good scalability both vertically and horizontally, we need to consider the principle of splitting when designing the architecture, I want to explain how to score in multiple aspects:

The first is horizontal score:
1. A large website resolves multiple small websites: When a website has multiple functions, we can consider splitting the website into several small modules. Each module can be a website, in this way, we can flexibly deploy these websites on different servers.
2. static dynamic separation: it is best to separate static files from dynamic files into two websites. We know that static websites and dynamic websites are different in terms of server pressure. The former may be io-intensive, and the latter may be CPU-intensive, we can also focus on hardware selection, and the static and dynamic content cache policies are different. For typical applications, we generally have independent file or image servers. In addition, the use of unused domain names can also improve the parallel loading capability of browsers.
3. by function: for example, a module is responsible for uploading, which consumes a lot of time. If it is mixed with other applications, a little access will paralyze the server, such special modules should be separated. Security and insecurity should also be separated, and future SSL purchases should also be taken into account.
4. we don't have to use all our servers. Searching and reporting can rely on others' services, such as Google's search and report services. What we do is not necessarily comparable to others, the server bandwidth is saved.

The second is vertical score:
1. Files are also equivalent to databases. Io traffic may be larger than databases. This is a vertical access level. uploaded file images must be separated from Web servers. Of course, there are very few databases and websites on one server, which is the most basic.
2. For the dynamics involving Database AccessProgramFor example, we can use an intermediate layer (so-called application layer or logic layer) to access the database (deployed on an independent server). The biggest benefit is the cache and flexibility. The cache memory usage is large. We need to separate it from the website process. In this way, we can easily change some data access policies, even if the database is distributed at that time, you can make a deployment here, which is very flexible. The advantage is that the middle layer can be used as a wire and network communication bridge. It may be faster for China Netcom to access the dual-line network than that of China Netcom to directly access the Telecom server.

Some people say that I don't know, but I can do load balancing. Yes, it is. But if I do, the same 10 machines will certainly be able to withstand more traffic than 10 machines, in addition, the requirement for hardware may not be very high, because it is especially good to know which hardware is needed. We strive to make every service period idle and not too busy, and make reasonable combination adjustment and expansion, so that the system is highly scalable, the premise for adjustment based on the traffic volume is that points have been taken into account before, the benefits of points are flexibility, scalability, isolation and security.

For servers, we have several points to observe for a long time. Any point may be a bottleneck:
1. CPU: the parsing of dynamic files requires a lot of CPU, and the CPU bottleneck depends on whether the function occupies the thread for a long time. If yes, it will be split out. Or, if each request is not processed for a long time but has a high access volume, the server is added. CPU is a good thing, so you cannot wait for it and do nothing.
2. Memory: the cache is independent from the IIS process. Generally, the web server does not have enough memory. The memory is faster than the disk and should be used properly.
3. Disk I/O: Use the performance monitor to find which files are particularly large in I/O. If I find the files, I/O is assigned to an independent set of file servers, or directly implement CDN. The disk is slow. Applications that read data on a large scale rely on caching. Applications that write data on a large scale can rely on queues to reduce the burst concurrency.
4. network: We know that network communication is relatively slow and slower than disks. If distributed cache and distributed computing are used, the network communication time between physical servers must be considered. Of course, when the traffic is high, this can improve the system's acceptance capability by a level. Static content can be shared by CSD. When making server assumptions, we also need to consider China's telecom network situation and firewall.

For SQL Server database servers, [update]:
In fact, it is still horizontal split and Vertical Split. in a two-dimensional table, horizontal split is a cross-cutting tool, and vertical split is a vertical split:
1. Vertical Split means that different applications can be divided into different databases, instances, or small tables with many fields.
2. Horizontal Split means that some applications may not be loaded, such as user registration, but the user table will be very large and large tables can be separated. You can use table partitioning, store data on different files, and then deploy it on an independent physical server to increase Io throughput to improve read/write performance. In other words, you can archive old data on a regular basis. Another advantage of Table Partitioning can increase the data query speed, because our page indexes can have multiple layers, just as there are not too many files in a folder, just like there are several layers of folders.
3. You can also use database images, copy subscriptions, and transaction logs to separate read/write data from different image physical databases. Generally, this is sufficient, if not, you can use hardware to achieve database load balancing. Of course, for Bi, we may also have data warehouses.

After this is taken into account in the architecture, if the traffic is high, you can adjust or balance the load of web servers or application servers on this basis. Most of the time, we repeatedly find the problem-find the bottleneck-to solve the problem.

The typical architecture is as follows:

Dynamic Web servers with better CPU and static Web servers and file servers with better Disks
The memory size of the application server is higher than that of the cache server. Of course, the memory and CPU usage of the database server are better.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.