I have done in the Cernet dial-up access platform, and then in the Yahoo3721 load search engine front-end platform development, but also in the mop to deal with the large-scale community mop the structure of the upgrade and other work, at the same time they have contacted and developed a number of large and medium-sized Web site modules, Therefore, there are some accumulation and experience in coping with high load and concurrent solutions for large Web sites, and we can discuss them with you.
A small site, such as personal site, you can use the simplest HTML static page to achieve, with some pictures to achieve beautification effect, all the pages are stored in a directory, such a site on the system architecture, performance requirements are very simple, with the continuous enrichment of internet business, Site-related technology through the development of these years, has been subdivided into very fine aspects, especially for large web sites, the technology is involved in a very wide range, from hardware to software, programming language, database, WebServer, firewalls and other fields have a high demand, is not the original simple HTML static site can match.
A large web site, such as a portal site. In the face of a large number of user access, high concurrent requests, the basic solution focused on a number of links: the use of high-performance servers, high-performance databases, efficient programming languages, and high-performance web containers. But in addition to these, there is no way to solve the high load and high concurrency problems faced by large web sites.
Some of the solutions provided above also mean a greater amount of input, and the solution has bottlenecks and no good scalability, and I'm going to say some of my experiences from a low cost, high performance and high scalability perspective.
1, HTML static
In fact, we all know that the most efficient and consumption of the smallest is the pure static HTML page, so we try to make our site on the page to use static page to achieve, the simplest method is actually the most effective way. But for a lot of content and frequently updated Web sites, we can not all manually to achieve one by one, so there is our common Information distribution system CMS, like we often visit the various portals of the news channels, and even their other channels, are through the information publishing system to manage and implement, Information Publishing system can realize the simplest information input automatically generate static pages, but also have channel management, authority management, automatic crawl functions, for a large web site, with a set of efficient, manageable CMS is essential.
In addition to portals and information publishing types of Web sites, for highly interactive community type sites, as far as possible static is also a necessary means to improve performance, the community of posts, articles in real time static, there is an update when the static again is a large number of use of the strategy, A hodgepodge of MOP is the use of such strategies, NetEase community and so on.
At the same time, HTML static is also the use of some caching strategies, for the system frequently use database query but the content update is very small application, you can consider the use of HTML static to implement, such as forum forum in public settings information, This information is currently the mainstream forum can be managed and stored in the database, the information is actually a large number of the foreground program calls, but the update frequency is very small, you can consider this part of the content to be updated in the background when the static, so as to avoid a large number of database access requests.
2, Picture server separation
As you know, for the Web server, whether it is Apache, IIS or other containers, the picture is the most resource-consuming, so we need to separate the picture and the page, which is basically a large site will adopt a strategy, they have a separate picture server, and even many of the image server. Such a framework can reduce the supply of page access requests to the server system pressure, and can ensure that the system will not crash because of picture problems, on the application server and image server, can be configured to optimize the configuration, such as Apache in the configuration of contenttype can be as little support as possible, As few loadmodule as possible, to ensure higher system consumption and execution efficiency.
3, database cluster and library table hash
Large Web sites have complex applications, these applications must use the database, then in the face of a large number of accesses, the database bottleneck will soon emerge, when a database will soon not meet the application, so we need to use a database cluster or library table hash.
In the database cluster, many databases have their own solutions, Oracle, Sybase, and so have a good solution, the common MySQL provided by the Master/slave is similar to the solution, you use what kind of db, refer to the corresponding solution to implement it.
The database cluster mentioned above is constrained by the DB type used in architecture, cost, and extensibility, so we need to consider improving the system architecture from an application perspective, which is a common and most effective solution. We install the business and application in the application or function module to separate the database, different modules corresponding to different databases or tables, and then according to a certain strategy for a page or function of a smaller database hash, such as user table, according to User ID table hash, This will improve the performance of the system at a low cost and have a good scalability. Sohu's forum is to adopt such a framework, the Forum users, settings, posts and other information for the database separation, and then the posts, users in accordance with the plate and ID hash database and table, the final configuration file can be a simple configuration can make the system at any time to add a low-cost database to supplement the system performance.
4, caching
The word cache has been approached with technology, and many places use caching. The Web site architecture and caching in Web development are also very important. Here we first describe the two most basic caches. The advanced and distributed caching is described later.
Architecture of the cache, more familiar to Apache people can know that Apache provides its own cache module, can also use the addition of Squid module for caching, both of which can effectively improve Apache access response capabilities.
Web site program Development cache, Linux provides the memory cache is commonly used caching interface, can be used in web development, such as Java development can invoke memorycache to cache some data and communication sharing, some large communities use such a framework. In addition, in the use of web language development, all languages have their own caching modules and methods, PHP has Pear's cache module, more Java,. NET is not very familiar, I believe there must be.
5, Mirror
Mirroring is a large web site often used to improve performance and data security, mirroring technology can solve the different network access and geographical user access speed difference, such as the difference between chinanet and Edunet has prompted many sites in the education network to build mirror sites, The data is scheduled to be updated or updated in real time. In the details of mirroring technology, here does not elaborate too deep, there are many professional off-the-shelf solution architecture and product optional. There are also inexpensive ways to implement the software, such as the Linux on the rsync and other tools.
6. Load Balance
Load balancing will be the ultimate solution for large web sites to address high load access and a large number of concurrent requests.
Load balancing technology has developed for many years, there are many professional service providers and products to choose from, I personally contacted a number of solutions, of which two of the framework can be used for reference.
hardware four-tier exchange
The fourth layer Exchange uses the header information of the third layer and the fourth Layer information packet, according to the application interval to identify the traffic flow, the entire interval segment of the traffic flow to the appropriate application server for processing. Layer Fourth switching functions are like virtual IP, pointing to the physical server. It transmits a variety of business compliance protocols, with HTTP, FTP, NFS, Telnet, or other protocols. These services require a complex load balancing algorithm based on the physical server. In the IP world, the business type is determined by the terminal TCP or UDP port address, and the application interval in layer fourth switching is determined by the source and terminal IP addresses, TCP, and UDP ports.
In the hardware four-tier switching product area, there are some well-known products to choose from, such as Alteon, F5 and so on, these products are expensive, but value for money, can provide very good performance and very flexible management capabilities. Yahoo China in the beginning of nearly 2000 servers using three or four units Alteon was done.
software four layer Exchange
You know, after the principle of the hardware layer four switch, the software four layer exchange based on the OSI model comes into being, the principle of this solution is consistent, but the performance is slightly poor. But to meet a certain amount of pressure or easy, some people say that the software implementation is actually more flexible, processing ability completely look at your configuration of the familiar ability.
Software four-tier exchange we can use the Linux on the commonly used LVS to solve, LVs is Linux Virtual Server, he provides a real-time disaster response based on the heartbeat line heartbeat solution, improve the robustness of the system, At the same time can provide flexible virtual VIP configuration and management functions, can meet a variety of application requirements, which is essential for distributed systems.
A typical strategy for using load balancing is to in the software or hardware four-tier exchange based on squid cluster, this idea in many large Web sites including search engines are adopted, such a low-cost architecture, high-performance and strong expansion, at any time to the structure of the add and subtract nodes are very easy. This architecture I am ready to clean up and discuss with you.
For large web sites, each of the previous mentioned methods may be used at the same time, I introduced here more superficial, the specific implementation of a lot of details also need to be familiar with and experience, sometimes a very small squid parameters or Apache parameter settings, the impact on the system performance will be very large, I hope that we can discuss together to achieve the effect of the discussion.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.