High concurrency problems in Web applications

Source: Internet
Author: User
Tags hash html page nginx server

Large Web sites, such as portals. In the face of a large number of user access, high concurrent requests, the basic solution is focused on a number of aspects: the use of high-performance servers, high-performance databases, high-efficiency programming language, as well as high-performance web containers. But in addition to these aspects, there is no way to solve the large-scale web site is facing high load and high concurrency problems. These solutions to a certain extent also means more investment, and such a solution with bottlenecks, not very good extensibility, the following from the usual project experience and cite some blog ideas to try to solve the high concurrency situation.

0, the first need to focus on the database

Yes, the first is the database, which is the first SPOF (single point of failure) that most applications face. Especially the application of Web2.0, the database response is the first to solve.
May initially be a host, when the data increased to more than 1 million, then the performance of the database dropped sharply. The common optimization measures are synchronous replication of m-s (master-slave) mode, and operation of queries and operations on separate servers. I recommend the M-m-slaves way, 2 master master, multiple slaves, it should be noted that although there are 2 master, but only 1 is active, we can switch at a certain time. The reason for using 2 m is to ensure that M will not become a system spof.
The slaves can be further load balanced and can be combined with LVS to properly balance the select operation to different slaves.
The above architecture can contend for a certain amount of load, but as the user grows further, your user table data exceeds 10 million, and that M becomes spof. You cannot expand the slaves arbitrarily, otherwise the cost of replication synchronization will go straight up. My method is table partitioning, which is partitioned from the business level. The simplest, take the user data as an example. According to a certain way of segmentation, such as ID, segmentation to a different database cluster.

The global database is used for the Meta data query. The disadvantage is that each query, will be added once, for example, you want to check a user Nightsailer, you first to the global database group to find nightsailer corresponding cluster ID, and then to the specified cluster to find nightsailer actual data.
Each cluster can be in m-m mode, or M-m-slaves way. This is an extensible structure, and as the load increases, you can simply add the new MySQL cluster to go in.

1, HTML Static in fact, we all know that the most efficient, the least expensive is the pure static HTML page, so we try to make the page on our site using static pages to achieve, the simplest method is actually the most effective method. But for a lot of content and frequently updated sites, we can not all manually to achieve, so we have a common information distribution system CMS, like we often visit the various portals of the news channel, and even their other channels, are through the information distribution system to manage and implement, Information Publishing system can achieve the simplest information input automatically generated static pages, but also with channel management, rights management, automatic capture and other functions, for a large web site, has a set of efficient, manageable CMS is essential.

In addition to the portal and the type of information publishing site, for the interactive requirements of the Community type site, as much as possible static is also to improve the performance of the necessary means, the community posts, articles in real-time static, there is a renewal of the time and re-static is a lot of use of the strategy, A hodgepodge like mop is the use of such strategies, such as the NetEase community.

At the same time, HTML static is also the use of some caching policies, for the system frequently using database queries but the content of small updates, you can consider the use of HTML static, such as forum public settings information, This information is currently the mainstream forum can be managed in the background and stored in the database, which is actually a lot of the foreground program calls, but the update frequency is very small, you can consider this part of the background update the time to static, so as to avoid a large number of database access requests.

2, Image server separation

You know, for the Web server, whether it is Apache, IIS or other containers, the picture is the most consumption of resources, so we have to separate the picture and the page, which is basically a large site will adopt the strategy, they have a separate picture server, and even many picture server. This architecture can reduce the server system pressure to provide page access requests, and can ensure that the system does not crash due to picture problems, on the application server and picture server, can be different configuration optimization, such as Apache in the configuration of contenttype can be as little as possible to support, LoadModule as little as possible to ensure higher system consumption and execution efficiency.

3. Database cluster and library table hash

Large Web sites have complex applications, which must use databases, and in the face of a large number of accesses, the bottleneck of the database can soon be revealed, when a database will soon be unable to meet the application, so we need to use the database cluster or library table hash.

In the database cluster, many databases have their own solutions, Oracle, Sybase and so on have a good solution, the common MySQL provided by the Master/slave is a similar scenario, you use what kind of db, refer to the corresponding solutions to implement.

The database cluster mentioned above is constrained by the DB type used in architecture, cost, and extensibility, so we need to consider improving the system architecture from the perspective of the application, and the library table hashing is the most common and effective solution. We install the business and application in the application or function module to separate the database, different modules corresponding to different databases or tables, and then according to a certain policy on a page or function of a smaller database hash, such as the user table, according to user ID for the table hash, This makes it possible to improve the performance of the system at a low cost and has a good scalability. Sohu Forum is the use of such a framework, the Forum users, settings, posts and other information database separation, and then to the post, the user in accordance with the plate and ID hash database and table, finally can be configured in the configuration file simple configuration will allow the system at any time to add a low-cost database to supplement the system performance.

4. Cache

The word cache has been touched by technology, and caches are used in many places. Caching in the Web site architecture and Web development is also very important. Here we first describe the two most basic caches. The advanced and distributed caches are described later.
Architecture cache, people familiar with Apache can know that Apache provides its own cache module, can also use the addition of Squid module for caching, both of which can effectively improve the access response of Apache.
Web application development cache, the memory cache provided on Linux is a common cache interface, can be used in web development, such as Java development can call MemoryCache to some data caching and communication sharing, some large communities use such a framework. In addition, in the use of web language development, all kinds of languages have their own cache modules and methods, PHP has pear cache module, Java more,. NET is not very familiar with, I believe there is certainly.

5, Nginx shunt

Nginx is written in an event-driven manner, so it has very good performance and is also a very efficient reverse proxy, load balancing. It has the performance of matching lighttpd, and there is no lighttpd memory leak problem, and lighttpd Mod_proxy has some problems and has not been updated for a long time.

Igor now publishes the source code in the form of a BSD-like license. Nginx is known for its stability, rich library of modules, flexible configuration, and low system resource consumption. The industry agrees that it is a lightweight replacement for apache2.2+mod_proxy_balancer, not only because it responds to static pages very quickly, but also because it has a module number of nearly 2/3 Apache. The support for proxy and rewrite modules is very thorough, also supports MOD_FCGI, SSL, vhosts, suitable for the front-end HTTP response for mongrel clusters.

Nginx is specially developed for performance optimization, and performance is the most important consideration, and the implementation is very focused on efficiency. It supports the kernel poll model, withstands high loads, and reports that it supports up to 50,000 concurrent connections.

Nginx has very high stability. Other HTTP servers, when encountering spikes in access, or when someone maliciously initiates a slow connection, are likely to cause the server to run out of physical memory frequently, lose response, and only restart the server. For example, once Apache has reached more than 200 processes, the Web response is significantly slower. Nginx adopts the phased resource allocation technology, which makes its CPU and memory occupancy rate very low. Nginx official said to maintain 10,000 inactive connections, it only accounted for 2.5M of memory, so a DOS-like attack on Nginx is basically useless. In terms of stability, Nginx is better than lighthttpd.

Nginx supports hot deployment. Its start-up is particularly easy, and it can be run almost 24x7, even if it runs for several months without restarting. You will also be able to upgrade the software version in the event of uninterrupted service.

Nginx uses the Master-slave model to take advantage of the advantages of SMP and to reduce the blocking latency of the working process on disk I/O. When using the Select ()/poll () call, you can also limit the number of connections per process.

Nginx code quality is very high, the code is very standard, the technique is mature, the module expansion is also very easy. In particular, the powerful upstream and the filter chain are worth mentioning. Upstream provides a good basis for the writing of other server communication modules, such as reverse proxy. The coolest part of the filter chain is that each filter does not have to wait for the previous filter to complete. It can take the output of the previous filter as input to the current filter, which is a bit like the Unix pipeline. This means that a module can begin compressing requests sent from back-end servers, and can compress the flow to the client before the module receives the entire request from the backend server.

Nginx uses some of the latest features provided by the OS such as support for Sendfile (linux2.2+), Accept-filter (freebsd4.1+), tcp_defer_accept (Linux 2.4+), which greatly improves performance.

Of course, Nginx is still very young, there are some problems, such as: Nginx is created by the Russians, although the documents relatively few years ago, but the current document is more comprehensive, the majority of English materials, Chinese information is more, and there are specialized books and information to check whether the configuration file is correct: CD/ Usr/local/nginx/sbin sudo./nginx-t
Reload config file:./nginx-s Reload Restart Nginx server: cd/etc/init.d sudo./nginx restart

6. Mirror

Mirroring is often used by large web sites to improve performance and data security, the mirror technology can solve the different network access providers and geographical user access speed differences, such as the difference between chinanet and edunet prompted a lot of websites in the education network to build mirror site, Data is scheduled to be updated or updated in real time. In terms of mirror detail technology, this is not too deep, there are many professional ready-made solution architectures and products to choose from. There are also inexpensive ways to implement software, such as the tools of Rsync on Linux.


7, load balancing (at present, not quite understand, first included)

Load balancing will be the ultimate solution for large web sites to address high-load access and a large number of concurrent requests.

Load balancing technology has developed for many years, there are many professional service providers and products can be selected, I personally contacted a number of solutions, including two architecture can give you a reference.


1) hardware four-layer switching


The fourth layer Exchange uses the header information of the third layer and fourth layer packets, according to the application interval to identify the business flow, the entire interval segment of the business flow distribution to the appropriate application server for processing. The fourth layer switch function is like a virtual IP, pointing to the physical server. It transmits services that comply with a variety of protocols, such as HTTP, FTP, NFS, Telnet, or other protocols. These operations are based on physical servers and require complex load balancing algorithms. In the IP world, the business type is determined by the terminal TCP or UDP port address, and the application interval in layer fourth switching is determined by the source and endpoint IP addresses, TCP, and UDP ports.


In the hardware four-layer switching product area, there are some well-known products to choose from, such as Alteon, F5, etc., these products are expensive, but value for money, can provide very good performance and very flexible management capabilities. Yahoo China at the beginning of nearly 2000 servers using three or four alteon to be done.


2) software four-layer switching


When you know the principle of hardware layer four switch, the software four layer exchange based on the OSI model comes into being, so the solution achieves the same principle, but the performance is slightly worse. But to meet a certain amount of pressure or comfortable, some people say that the software implementation is actually more flexible, the ability to handle the full look at your configuration of the familiar ability.


Software four-layer switching we can use the common LVS on Linux to solve, LVs is Linux Virtual Server, he provides a real-time disaster response based on the Heart Line heartbeat solution, improve the system robustness, At the same time, the flexible virtual VIP configuration and management functions can meet a variety of application requirements, which is necessary for distributed systems.


A typical use of load balancing strategy is to build a squid cluster on the basis of software or hardware four-layer switching, which is adopted on many large Web sites including search engines, which have low cost, high performance and strong extensibility, and it is easy to add or subtract nodes to the architecture at any time. Such a structure I am ready to empty a special detail and discuss with you.


Reference article: http://blog.csdn.net/zxl333/article/details/8454319

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.