Second kill and snapping of the system structure _ structure of the electricity merchant website

Source: Internet
Author: User

First, the challenge of large-scale concurrency

In the past work, I have faced with 5w per second high concurrent seconds kill function, in this process, the entire web system has encountered many problems and challenges.

If the web system does not make targeted optimizations, it can easily fall into the abnormal state. Now let's talk about the idea and method of optimization.

1. Reasonable design of Request interface

A second kill or snapped page, usually divided into 2 parts, one is static HTML content, the other is involved in the second Kill Web background request interface.

Usually static HTML and other content, is through the deployment of CDN, the general pressure is not, the core bottleneck is actually in the background request interface.

This backend interface must be able to support high concurrent requests, and at the same time, it is important to be as "fast" as possible and to return the user's request results in the shortest amount of time.

In order to achieve this as quickly as possible, the back-end storage of the interface is better to use memory-level operations.

Storage that is still directly oriented to MySQL is not appropriate, and it is recommended that asynchronous writes be used if there is a need for this complex business.

Of course, there are also some seconds to kill and buy the use of "lag feedback", that is, seconds to kill now do not know the results, a period of time before you can see from the page whether the user seconds kill success.

However, this kind of "lazy" behavior, but also to the user experience is not good, easy to be considered by the user is "black box operation."

2. High concurrent challenges: Be sure to "fast"

We usually measure the throughput of a web system is QPS (Query per Second, processing requests every second), solve tens of thousands of times per second high concurrency scenario, this metric is critical.

For example, we assume that the average response time for a business request is 100ms, and that there are 20 Apache Web servers in the system, with a configuration of maxclients of 500 (representing the maximum number of connections to Apache).

So, the theoretical peak of our web system is QPS (idealized calculation):

20*500/0.1 = 100000 (100,000 QPS)

Hey. Our system seems to be very strong, 1 seconds to handle 100,000 of the request, 5W/S's second kill seems to be "paper Tiger" ha.

The actual situation, of course, is not so ideal. In a high concurrency scenario, the machine is in a high load state, at which point the average response time is greatly increased.

As far as the Web server is concerned, the more connected processes the Apache opens, the more context switches the CPU needs to handle, the additional CPU consumption, and then the direct increase in the average response time.

Therefore, the above maxclient number, according to CPU, memory and other hardware factors, not the more the better.

You can test it with the Abench from Apache and take a suitable value.

Then, we select the memory-level storage Redis, which is critical in the high concurrency state of the storage response time.

Although network bandwidth is also a factor, however, this request packet is generally relatively small, generally rarely become the bottleneck of the request. Load balancing is less of a system bottleneck and is not discussed here.

So the question is, assuming our system, in the high concurrency state of the 5w/s, the average response time from 100ms to 250ms (actual, even more):

20*500/0.25 = 40000 (40,000 QPS)

As a result, our system left the 4w QPS, the face of 5w per second request, the middle of the difference between 1w.

Then, this is the real nightmare to begin with.

For example, high-speed intersection, 1 seconds to 5 vehicles, 5 vehicles per second, high-speed junction operation is normal. Suddenly, this intersection 1 seconds only through 4 vehicles, the flow of traffic is still, the result must be a big jam. (5 lanes suddenly become 4 lanes of feeling)

Similarly, within a second, the 20*500 available connection processes are at full load, but there are still 10,000 new requests, no connection processes available, and the system is expected to fall into an abnormal state.

In a normal, non-high concurrency business scenario, there is a similar situation where a business request interface is problematic, response time is very slow, the entire Web request response time is pulled long, the Web server is gradually filled with the number of available connections, other normal business requests, no connection process available.

The more terrible problem is that the behavior of the user is characteristic, the more unavailable The system is, the more frequent the user clicks, the vicious circle eventually leads to "avalanches" (one web machine hangs, causing traffic to spread to other working machines, causing the normal machines to hang, and then the vicious cycle), bringing down the entire web system.

3. Restart and overload protection

If the system occurs "avalanche", hastily restart the service, is unable to solve the problem. The most common phenomenon is that, after starting up, immediately hung up. At this time, it is best to reject traffic at the entry level and then reboot.

If it is redis/memcache this service is also hung up, you need to pay attention to "warm up" when restarting, and it is likely to take a long time.

Second kill and snapping scenes, traffic is often beyond our system of preparation and imagination. At this time, overload protection is necessary. A denial of request is also a protection measure if the system is detected as full load.

Setting the filter on the front end is the easiest way to do it, but the behavior is "CHOUFSO" by the user. More appropriately, the overload protection is set at the CGI entry layer to quickly return the customer's direct request.

Second, the means of cheating: offense and defense

Seconds to kill and snapped up received a "massive" request, in fact, the water inside is very large. Many users, in order to "grab" the merchandise, will use the "Brush ticket tool" and other types of assistive tools to help them send as many requests to the server.

There is also a subset of advanced users who make powerful automated request scripts. The rationale for this practice is also simple: in the request to participate in the second kill and snapped up, the number of their own requests accounted for more, the higher the probability of success.

These are "cheating means", however, there is "offensive" there is "defensive", this is a battle without the smoke of the war ha.

1. The same account, a one-time issue of multiple requests

Some users through the browser plug-ins or other tools, in the beginning of the second kill time, to their own account, send hundreds or even more requests. In fact, such users destroy the fairness of second kill and snapping.

This request can also cause another kind of damage in some systems that do not have data security processing, causing some judgment conditions to be bypassed.

For example, a simple pick logic, first to determine whether the user has participated in the record, if not to obtain success, and finally write to the participation record. This is a very simple logic, but in a high concurrency scenario, there are deep vulnerabilities.

Multiple concurrent requests are distributed to multiple Web servers in the intranet through a load-balancing server, which sends a query request to the store, and then, in the time lag when a request is successfully written to the participating record, the other requests are "not participating in the record".

Here, there is the risk of a logical judgment being bypassed.

Response plan:

At the entrance of the program, an account is allowed to accept only 1 requests and other requests to filter. Not only solve the same account, send n request questions, but also ensure the subsequent logical process of security.

Implementation, you can write a flag bit by Redis this memory cache service (only allow 1 requests to write successfully, combined with watch optimistic lock characteristics), the successful write can continue to participate.

Or, implement a service yourself, put the request of the same account into a queue, process one, and then process the next.

2. Multiple accounts, send multiple requests at once

Many companies account registration function, in the early development of almost no restrictions, it is easy to register a number of accounts. Therefore, also led to the emergence of a number of special studios, by writing automatic registration scripts, accumulated a large number of "zombie account", a huge amount, tens of thousands of or even hundreds of thousands of of the account range, specialized in all kinds of brush behavior (this is the Micro-blog "zombie powder" source).

For instance, for example, there is a forward lottery in the microblog, if we use tens of thousands of "zombie number" to go into the forwarding, so that we can greatly improve the probability of winning the lottery.

This account, used in the second kill and snapping, is the same reason. For example, the iphone's official website snapped up, train ticket scalpers.

Response plan:

This scenario can be resolved by detecting the frequency of the specified machine IP request, and if an IP request is found to be a high frequency, it can be ejected with a captcha or a direct prohibition of its request:

Pop-up verification code, the most core pursuit is to identify the real user. Therefore, we may often find that the site pop-up verification code, some are "ghosts and dance" appearance, sometimes let us simply can not see.

The reason they do this is also to make the image of the verification code not easily recognized, because the powerful "automatic script" can be used to identify the characters in the image, and then let the script automatically fill in the Verification code.

In fact, there are some very innovative verification code, the effect will be better, for example, to give you a simple question to answer, or let you complete some simple operations (such as Baidu Post Bar Verification code).

The direct prohibition of IP, in fact, is somewhat rude, because some real users of the network scene is exactly the same export IP, there may be "accidental injury." However, this approach is simple and efficient, and can be achieved with good results based on actual scenarios.

3. Multiple accounts, different IP send different requests

The so-called while, outsmart. There is an attack, there will be defense, never rest. These "studio", found that you have a single IP request frequency control, they also for this scenario, came up with their "New attack plan" is to constantly change the IP.

There are classmates wondering how these random IP services come in. Some of the agencies themselves occupy a group of independent IP, and then made a random proxy IP services, paid to these "studio" use.

There are some more dark, that is, through the Trojan black ordinary users of the computer, this Trojan does not damage the normal operation of the user's computer, only to do one thing, that is, forwarding IP packets, ordinary users of the computer has become an IP agent export.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.