System architecture of large high-concurrency high-load Web sites

Source: Internet
Author: User

A small website, such as a personal site, can use the simplest HTML static page can be achieved, with some pictures to achieve beautification effect, all the pages are placed in a directory, such a site on the system architecture, performance requirements are very simple, with the Internet business is constantly enriched, Website related technology After these years of development, has been subdivided into very fine aspects, especially for large sites, the use of technology is very wide, from hardware to software, programming languages, databases, webServer, firewalls and other fields have a very high requirements, is not the original simple HTML static site can be compared.

Large Web sites, such as portals. In the face of a large number of user access, high concurrency requests, the basic solution is focused on a number of aspects: the use of high-performance servers, high-performance databases, high-efficiency programming language, as well as high-performance web containers. But in addition to these aspects, there is no way to solve the large-scale web site is facing high load and high concurrency problems.

The above offers a few solutions to a certain extent means more investment, and such a solution has bottlenecks, not very good scalability. Here are a few things to consider, from a low-cost, high-performance and high-scalability perspective.

1. Static HTML

In fact, we all know that the most efficient, the least expensive is the pure static HTML page, so we use the pages on our site using static pages to achieve, the simplest method is actually the most effective method. But for a lot of content and frequently updated sites, we can not all manually implemented, so we have a common information distribution system CMS, like we often visit the various portals of the news channel, and even our other channels, are through the information distribution system to manage and implement, Information Publishing system can achieve the simplest information input automatically generated static pages, but also with channel management, rights management, automatic capture and other functions, for a large web site, has a set of efficient, manageable CMS is essential.

In addition to the portal and the type of information publishing site, for the interactive requirements of the Community type site, as much as possible static is also to improve the performance of the necessary means, the community posts, articles to implement the static, there is an update when considering the re-static is also a lot of use of the strategy.

At the same time, HTML static is also the method used by some cache policy, for the system frequently use data query but the content update very small application, you can consider the use of HTML static, such as the Forum's common settings information, This information is currently the mainstream forum can be managed in the background and stored in the database, which is actually a large number of the foreground program calls, but the update frequency is very small, you can consider this part of the background updates when the static, so as to avoid a large number of data access requests.

2, Image server separation

You know, for the Web server, whether it is Apache, IIS or other containers, the picture is the most consumption of resources, especially the IO operation, the system energy consumption is also high, so we need to separate the image on the page, which is basically a large web site will take the strategy, They will use a separate image server, or even a number of image servers. Such a architecture can reduce the server system pressure of page access requests, and can ensure that the system does not collapse due to picture problems, the application server and image server can be different configuration optimization, such as Apache in the configuration of Contnettyepe can be as little support as possible, LoadModule as little as possible to ensure higher system consumption and execution efficiency.

3. Database cluster and library table hash

Large Web sites have complex applications that must use databases, so the bottleneck of the database is quickly revealed when there is a large number of accesses, and we need to use a DB cluster or a table hash when a database is quickly unable to meet the application.

In the database cluster, many databases have their own solutions, Oracle, Sybase and so on have a good solution, commonly used MySQL provides master/slave is similar to the scheme, the use of what kind of DB, refer to the corresponding solution to implement the

The database cluster mentioned above is constrained by the DB type used in architecture, cost, and extensibility, so we need to consider improving the system architecture from the perspective of the application, and the library table hash is the most common and effective solution. We install the business and application in the application or functional modules to separate the database, different modules corresponding to different databases or tables, and then according to a certain policy on a page or function of a smaller database hash, such as user tables, the Forum users, settings, posts and other information database separation, and then the post, The user hashes the database and table according to the plate and ID, which can be easily configured in the configuration file to replenish the system at any time by adding a low-cost database.

4. Cache

Cache once the technology has been exposed, many places are used to cache. Caching in the Web site architecture and Web development is also very important. Here we first describe the two most basic caches. Advanced and distributed caches are described later in this section.

Architecture cache, people familiar with Apache can know that Apache provides its own cache module, or can use the addition of Squid module for caching, both of which can effectively improve Apache's responsiveness

Web site program Development cache, Linux provides memorycache is a common cache interface, can be used in web development, such as Java development can call MemoryCache to some data caching and communication sharing, some large communities use such a framework. In addition, when using web language development, all kinds of languages have their own caching modules and methods.

5. Mirror

Mirroring is often used by large web sites to improve performance and data security, mirroring technology can solve the different network access to the user access speed differences, such as the difference between chinanet and edunet prompted a lot of websites in the education network to build mirror site, Data is scheduled to be updated or updated in real time. In the detail technology of mirroring, this is not too deep, there are many professional ready-made solution architectures and products to choose from. There are also inexpensive ways to implement software, such as the tools of Rsync on Linux

6. Load Balancing

Load balancing will be the ultimate solution for large web sites that address high-load access and a large number of requests.

Load balancing technology has been developed for many years, with many professional service providers and products to choose from, such as

Hardware four-layer switching

The fourth layer Exchange uses Baotou information of the third layer and fourth layer information, according to the application interval to identify the business flow, the entire interval of the business flow distribution to the appropriate application server for processing. The layer fourth switch function is like a virtual IP, pointing to the physical server. It transmits services that comply with a variety of protocols, such as HTTP, FTP, NFS, Telnet, or other protocols. These operations are based on physical servers and require complex load balancing algorithms. In the IP world, the business type has the terminal TCP or UDP port address to decide, the application interval in the fourth layer exchange is determined by the source end and the terminal IP address, TCP and UDP port together.

In the hardware four-layer Exchange product area, there are some well-known products can be selected, such as ALTEON,F5, these products are expensive, but value for money, can provide very good performance and very flexible management capabilities, Yahoo China at the beginning of nearly 2000 servers using three or four sets of Alteon to be done.

Software four-layer switching

You know the hardware layer four switch principle, based on the OSI model to achieve the software four-layer exchange came into being, such a solution to achieve the principle of development, can not be slightly poor performance, but to meet a certain amount of pressure or comfortable, some people say that the software is actually more flexible implementation, processing completely look at your configuration of the familiar ability.

Software layer Four switching we can use the liunx on the common LVs to solve, LVS is the Linux virtualserver, he provides a real-time disaster response based on the heartbeat line heartbeat solution, provide system robustness, while providing flexible VIP configuration and management functions, Can meet a variety of application requirements, which is essential for distributed systems.

A typical use of load balancing strategy is to build a squid cluster on the basis of software or hardware four-layer switching, which is adopted on many large Web sites including search engines, which has a low cost, high performance and strong extensibility, and it is easy to add or subtract nodes to the architecture at any time.

For large web sites, each of the previously mentioned methods may be used at the same time, I introduced the relatively simple, specific implementation process a lot of details also need to understand, sometimes a very small squid parameters or Apache parameter settings, the impact on the system performance will be very large.

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.