Performance optimization and system architecture of high Performance website

Source: Internet
Author: User
Tags file system hash html page

A small website, such as personal website, can use the simplest HTML static page to achieve, with some pictures to achieve beautification effect, all the pages are stored in a directory, such a site on the system architecture, performance requirements are very simple, with the Internet business is constantly enriched, Website related technology After these years of development, has been subdivided into very fine aspects, especially for large sites, the use of technology is very wide, from hardware to software, programming languages , databases, WebServer, firewalls and other fields have a very high requirements, is not the original simple HTML static site can be compared.

Large Web sites, such as portals. In the face of a large number of user access, high concurrent requests, the basic solution is focused on a number of aspects: the use of high-performance servers, high-performance databases, high-efficiency programming language, as well as high-performance web containers. But in addition to these aspects, there is no way to solve the large-scale web site is facing high load and high concurrency problems.

The above offers a few solutions to a certain extent also means a greater input, and such a solution to the bottleneck, not very good extensibility, below I from the low cost, high performance and high expansion of the perspective of my experience.

1. Static HTML
In fact, we all know that the most efficient, the least expensive is the pure static HTML page, so we try to make the page on our site using static pages to achieve, the simplest method is actually the most effective method. But for a lot of content and frequently updated sites, we can not all manually to achieve, so we have a common information distribution system CMS, like we often visit the various portals of the news channel, and even their other channels, are through the information distribution system to manage and implement, Information Publishing system can achieve the simplest information input automatically generated static pages, but also with channel management, rights management, automatic capture and other functions, for a large web site, has a set of efficient, manageable CMS is essential.

In addition to the portal and the type of information publishing site, for the interactive requirements of the Community type site, as much as possible static is also to improve the performance of the necessary means, the community posts, articles in real-time static, there is a renewal of the time and re-static is a lot of use of the strategy, A hodgepodge like mop is the use of such strategies, such as the NetEase community. At present, many blogs have also been implemented static, I use this blog program WordPress has not been static, so if the face of high-load access, www.toplee.com must not bear

At the same time, HTML static is also the use of some caching policies, for the system frequently using database queries but the content of small updates, you can consider the use of HTML static, such as forum public settings information, This information is currently the mainstream forum can be managed in the background and stored in the database, which is actually a lot of the foreground program calls, but the update frequency is very small, you can consider this part of the background update the time to static, so as to avoid a large number of database access requests.

In the HTML static can use a compromise method, that is, the front-end use of dynamic implementation, under a certain strategy for the timing of static and timed call, this can achieve a lot of flexibility of operation, I developed the billiard site of the People (www.8zone.cn) is the use of such a method, I cache dynamic site content by setting some HTML static time intervals to share most of the stress on static pages, which can be applied to the architecture of small and medium-sized websites. Home Site Address: http://www.8zone.cn, by the way, there is a lot of friends like billiards support me this free site:)

2, Image server separation
You know, for the Web server, whether it is Apache, IIS or other containers, the picture is the most consumption of resources, so we have to separate the picture and the page, which is basically a large site will adopt the strategy, they have a separate picture server, and even many picture server. Such architectures can reduce the pressure on the server system that provides page access requests and ensure that the system does not crash due to picture problems.

In the application server and picture server, can be different configuration optimization, such as Apache in the configuration of contenttype can be as little as possible to support, as little as loadmodule, to ensure higher system consumption and execution efficiency.

My billiard site 8zone.cn also used the image server architecture on the separation, which is currently only architecturally separate, physically not separated, due to no money to buy more servers:), You can see the picture connection on the people's home is similar to img.9tmd.com or img1.9tmd.com URL.

In addition, in dealing with static pages or images, JS and other access, you can consider using LIGHTTPD instead of Apache, which provides a more lightweight and more efficient processing power.

3. Database cluster and library table hash
Large Web sites have complex applications, which must use databases, and in the face of a large number of accesses, the bottleneck of the database can soon be revealed, when a database will soon be unable to meet the application, so we need to use the database cluster or library table hash.

In the database cluster, many databases have their own solutions, Oracle, Sybase and so on have a good solution, the common MySQL provided by the Master/slave is a similar scenario, you use what kind of db, refer to the corresponding solutions to implement.

The database cluster mentioned above is constrained by the DB type used in architecture, cost, and extensibility, so we need to consider improving the system architecture from the perspective of the application, and the library table hashing is the most common and effective solution. We install the business and application in the application or function module to separate the database, different modules corresponding to different databases or tables, and then according to a certain policy on a page or function of a smaller database hash, such as the user table, according to user ID for the table hash, This makes it possible to improve the performance of the system at a low cost and has a good scalability. Sohu Forum is the use of such a framework, the Forum users, settings, posts and other information database separation, and then to the post, the user in accordance with the plate and ID hash database and table, finally can be configured in the configuration file simple configuration will allow the system at any time to add a low-cost database to supplement the system performance.

4. Cache
The word cache has been touched by technology, and caches are used in many places. Caching in the Web site architecture and Web development is also very important. Here we first describe the two most basic caches. The advanced and distributed caches are described later.

Architecture cache, people familiar with Apache can know that Apache provides its own Mod_proxy cache module, can also use additional squid for caching, both of which can effectively improve the access response of Apache.

Web site program Development cache, Linux provides memcached is a common caching scheme, many Web programming languages provide memcache access interface, PHP, Perl, C and Java have, can be used in web development, The data, objects and other content can be cached in real-time or cron, and the strategy is very flexible. Some large communities have used such architectures.

In addition, in the use of web language development, all kinds of languages have their own cache modules and methods, PHP has pear cache module and eaccelerator acceleration and cache module, but also the well-known APC, XCache (developed by the Chinese, support. PHP cache module, Java more,. NET is not very familiar, I believe there is certainly.

5. Mirror
Mirroring is often used by large web sites to improve performance and data security, the mirror technology can solve the different network access providers and geographical user access speed differences, such as the difference between chinanet and edunet prompted a lot of websites in the education network to build mirror site, Data is scheduled to be updated or updated in real time. In terms of mirror detail technology, this is not too deep, there are many professional ready-made solution architectures and products to choose from. There are also inexpensive ways to implement software, such as the tools of Rsync on Linux.

6. Load Balancing
Load balancing will be the ultimate solution for large web sites to address high-load access and a large number of concurrent requests.

Load balancing technology has developed for many years, there are many professional service providers and products can be selected, I personally contacted a number of solutions, including two architecture can give you a reference. In addition, the primary load balancer DNS round robin and the more professional CDN architecture is not much to say.

6.1 Hardware four-layer switching
The fourth layer Exchange uses the header information of the third layer and fourth layer packets, according to the application interval to identify the business flow, the entire interval segment of the business flow distribution to the appropriate application server for processing. The fourth layer switch function is like a virtual IP, pointing to the physical server. It transmits services that comply with a variety of protocols, such as HTTP, FTP, NFS, Telnet, or other protocols. These operations are based on physical servers and require complex load balancing algorithms. In the IP world, the business type is determined by the terminal TCP or UDP port address, and the application interval in layer fourth switching is determined by the source and endpoint IP addresses, TCP, and UDP ports.

In the hardware four-layer switching product area, there are some well-known products to choose from, such as Alteon, F5, etc., these products are expensive, but value for money, can provide very good performance and very flexible management capabilities. Yahoo China at the beginning of nearly 2000 servers using three or four alteon to be done.

6.2 Software four-layer switching
When you know the principle of hardware layer four switch, the software four layer exchange based on the OSI model comes into being, so the solution achieves the same principle, but the performance is slightly worse. But to meet a certain amount of pressure or comfortable, some people say that the software implementation is actually more flexible, the ability to handle the full look at your configuration of the familiar ability.

Software four-layer switching we can use the common LVS on Linux to solve, LVs is Linux Virtual Server, he provides a real-time disaster response based on the Heart Line heartbeat solution, improve the system robustness, At the same time, the flexible virtual VIP configuration and management functions can meet a variety of application requirements, which is necessary for distributed systems.

A typical use of load balancing strategy is to build a squid cluster on the basis of software or hardware four-layer switching, which is adopted on many large Web sites including search engines, which have low cost, high performance and strong extensibility, and it is easy to add or subtract nodes to the architecture at any time. Such a structure I am ready to empty a special detail and discuss with you.

Summarize:
For large web sites, each of the previously mentioned methods may be used at the same time, Michael introduced here is relatively simple, the implementation of a lot of details of the process needs to be familiar with and experience, sometimes a very small squid parameter or Apache parameter settings, the impact on the system performance will be very large , we hope that we will discuss together to achieve the effect.

Reprint please keep Source: June Lin Michael's blog (http://www.toplee.com/blog/?p=71)
Trackback url:http://www.toplee.com/blog/wp-trackback.php?p=71

This entry is filed under/C + +/Other technology, technical exchange. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.


(2 votes, average:6.5 out of 10)
Loading ...
Responses to "Talk about the system architecture of large high-concurrency high-load Web sites"
1
Pi1ot says:

April 29th, 2006 at 6:00pm
Quote
Communication between the modules or between processes generally asynchronous queueing is also very important, can take into account the light load response performance and system pressure, database pressure can be decomposed through the file cache to the filesystem, file system IO pressure again through the mem cache decomposition, the effect is very good.

3
Guest says:

May 1st, 2006 at 8:13 am
Quote
Totally nonsense!
"You know, for the Web server, whether it is Apache, IIS or other containers, the picture is the most resource-consuming," You think is in memory dynamically generated pictures AH. No matter what the file, in the container output just read the file, output to response only, and what is the file has any relationship.

The key is that there should be different policies between static and dynamic pages, such as static files should be cached as much as possible, because no matter how many times you request the output content is the same, if there are 20 on the user page there is no need to request 20 times, but should use the cache. Dynamic pages each request output is different (otherwise it should be static) ), so it should not be cached.

So even on the same server can be static and dynamic resources to do different optimizations, dedicated image server that is for the convenience of resource management, and you say the performance is not related.

4
Michael says:

May 2nd, 2006 at 1:15 am
Quote
Dynamic cache case Estimation The upstairs friends have not encountered, in the case of Inktomi search results, we use all the face of dynamic cache, for the same keywords and query conditions, such a cache is very important, for the dynamic content cache, The use of reasonable header parameters in programming can easily manage cached policies, such as expiration time.

We talk about the impact of the picture on the performance of the problem, in general, most of our visit page images are often more than the HTML code occupied by the traffic, in the same network bandwidth, the picture transmission takes longer, because the transmission needs to spend a lot of money on the connection, This will extend the user Client side and server side of the HTTP connection length, which for Apache, concurrency performance will certainly decline, unless your return is all static, it can be httpd.conf in the keepalives is off, which can reduce the connection processing time , but if too many pictures can cause more connections to be established, it also consumes performance.

In addition, the theory we mentioned is more about the case of large clusters, in such an environment, the separation of images can effectively improve the structure, and thus affect the performance of the improvement, you need to know why we talk about architecture. The architecture may be for security, for resource allocation, and for more scientific development and management, but the end as far as are for performance.

It is also easy to find descriptions of the MIME type and content length sections in the HTTP protocol documentation for RFC1945, which is easy to understand the performance impact of a picture.

The friend upstairs is completely the villain, hope not to use guest with me, man also afraid people know your name. Besides, even if it is wrong, it is not to use nonsense to pick a fault. We are in exchange and study, I am not an expert, at most, a normal programmer only.

5
Ken Kwei says:

June 3rd, 2006 at 3:42 pm
Quote
Michael Hello, this article I have seen several times, there is a problem, your article mentions the following paragraph:

"For the highly interactive community type site, as much as possible static is also to improve the performance of the necessary means, the community posts, articles in real-time static, there are updates and re-static is also a lot of use of the strategy, such as MOP is a hodgepodge of the use of such strategies, NetEase community and so on. ”

For large sites, his database and Web Server is generally distributed, in a number of regions have been deployed, when a user in a region will be mapped to a node, if the community posts in real-time static, there are updates and re-static, then how to synchronize between nodes immediately. How is the database side implemented? If the user does not see it, they will think the post failed. Cause duplicate, then how to lock the user on a node, how to solve this. Thank you.

6
Michael says:

June 3rd, 2006 at 3:57 pm
Quote
For a user to lock on a node is implemented through a four-layer exchange, generally so, if the application is relatively small can be implemented through program code. Large applications typically manage user connections through a four-layer switch like LVS and hardware, and policies can be developed to keep a user's connection on a node for a lifetime.

Static and synchronous strategy is more, the general approach is to centralize or distribute storage, but static is achieved through centralized storage, and then use the front-end proxy group to achieve the cache and share the pressure.


Generally for a medium-sized web site, interactive operation is very much, day PV million, how to do a reasonable load.

If the interaction is very much, you can consider using the cluster memory cache method, the constantly changing and need to synchronize the data into the memory cache for reading, the specific scenario needs to be analyzed in combination with specific circumstances.

11
Donald says:

June 27th, 2006 at 5:39 pm
Quote
Excuse me, if a website is in the technical development period, then these optimization means should first implement what after implementation.
Or, in terms of cost (technical, human, and financial), which first implementation can achieve maximum results.

12
Michael says:

June 27th, 2006 at 9:16 pm
Quote
Donald on June, 2006 at 5:39 PM said:

Excuse me, if a website is in the technical development period, then these optimization means should first implement what after implementation.
Or, in terms of cost (technical, human, and financial), which first implementation can achieve maximum results.

First, from the server performance optimization, code performance optimization, including webserver, dbserver optimization configuration, HTML static and so easy to start, these links strive to extract the maximum utilization, and then consider the increase in the structure of investment, such as clustering, load balancing, etc. All these need to be considered more appropriately after some development and accumulation.

16
Echonow says:

September 1st, 2006 at 2:28 pm
Quote
Like a first, is a very good article, but to really grasp the inside of the things I am afraid still need time and practice.

First ask about the picture server problem.

My billiard site 9tmd.com also used the image server architecture on the separation, which is currently only architecturally separate, physically not separated, due to no money to buy more servers:), You can see the picture connection on the people's home is similar to img.9tmd.com or img1.9tmd.com URL.

This, the landlord this img.9tmd.com is a virtual host bar, that is, an Apache service bar, so the performance of the improvement is also very meaningful. Or just cushion, in order to facilitate the physical separation later.

17
Michael says:

September 1st, 2006 at 3:05 pm
Quote
Echonow on September 1, 2006 at 2:28 PM said:

Like a first, is a very good article, but to really grasp the inside of the things I am afraid still need time and practice.

First ask about the picture server problem.

My billiard site 9tmd.com also used the image server architecture on the separation, which is currently only architecturally separate, physically not separated, due to no money to buy more servers:), You can see the picture connection on the people's home is similar to img.9tmd.com or img1.9tmd.com URL.

This, the landlord this img.9tmd.com is a virtual host bar, that is, an Apache service bar, so the performance of the improvement is also very meaningful. Or just cushion, in order to facilitate the physical separation later.

The friend said very right, because there is only one server, so physically unable to achieve real separation, temporary use of virtual host to achieve, is to design and Web site architecture flexibility, if there is a new server, I just need to mirror the image past or synchronized past, Then the img.9tmd.com DNS resolution to the new server on the natural implementation of the separation, if not now from the implementation of architecture and procedures, the future of such separation will be more painful:)

18
Echonow says:

September 7th, 2006 at 4:59 pm
Quote
Thanks to LZ's reply, now the main implementation of the problem is how to upload the material directly to the image server, not every time first upload to the web, and then sync to the picture server it

19
Michael says:

September 7th, 2006 at 11:25 pm
Quote
Echonow on September 7, 2006 at 4:59 PM said:

Thanks to LZ's reply, now the main implementation of the problem is how to upload the material directly to the image server, not every time first upload to the web, and then sync to the picture server it

Implementation via Samba or NFS is a relatively simple approach. Then use the squid cache to reduce the load on the access, improve disk performance, and prolong disk life.

20
Echonow says:

September 8th, 2006 at 9:42 am
Quote
Thank you for the patience of the landlord guide, I first study, with the sharing area to store is really a good idea!

21st
Michael says:

September 8th, 2006 at 11:16 am
Quote
Echonow on September 8, 2006 at 9:42 AM said:

Thank you for the patience of the landlord guide, I first study, with the sharing area to store is really a good idea!

You are welcome to communicate frequently.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.