Electric Dealer website second kill activity
Second kill activity is usually defined as: The activity party in a limited period of time (usually between M-min to H-hour time) to give a specified number of O p goods in the sale of a large discount.
This kind of second kill activity generally will appear as follows ↓↓
First, in a certain time QPS over the system load (note: QPS, refers to the query rate per second);
Second, the structure is unreasonable cause other system and the second kill activity is not related to the module becomes unusually slow;
Third, a small number of users repeatedly grab places;
Finally, the number of places to be robbed exceeds the inventory quantity;
V, the server down after the slow recovery caused a large number of users into the competitor's web site;
Six, the robot traffic occupies the website visit to cause the real user to visit slowly.
The solution is all people think out, but only a matter of time.
Solution Background: LNMP Technology stack
(six questions)
First question:
1. Set the Nginx configuration:
The house that is: since the specified time, the second kill activity QPS reach peak Peak1, then in the second kill activity concurrency test when we should first get this worthy of the average range, and then take the Minimum (min), so you can pass Nginx ngx_http_limit_req_ Module and ngx_http_limit_conn_module two modules to limit the Nginx configuration as follows:
(PS: Speaking of the Ngx_http_limit_conn_module module, to limit the number of concurrent connections.) So how do you limit the number of requests? This needs to be implemented through the Ngx_http_limit_req_module module, which can limit the frequency of request processing by defining key values. In particular, you can limit the frequency of request processing from a single IP address. The restricted method is like a funnel, the number of fixed processing requests per second, and too many requests are deferred. To prevent DDoS attacks in the application layer. )
HTTP {#geot和map两段用于处理限速白名单, map segment map list to $limit, IP in GEO will be mapped to null value, otherwise its IP address.
#limit_conn_zone和limit_req_zone指令对于键为空值的将会被忽略, so as not to restrict the listed IP Geo $whiteiplist {default 1;
127.0.0.1 0;
121.199.16.249 0;
The map $whiteiplist $limit {1 $binary _remote_addr;
0 ""; #limit_conn_zone定义每个IP的并发连接数量 #设置一个缓存区保存不同key的状态, size 10m.
Use $limit as a key to limit the number of links per source IP limit_conn_zone $limit zone=perip:10m; #limit_req_zone定义每个IP的每秒请求数量 #设置一个缓存区reqps保存不同key的状态, size 10m.
The state here refers to the current number of excess requests.
# $limit NULL value is not limited to speed, otherwise the corresponding IP limit of 5 connection requests per second.
Limit_req_zone $limit zone=reqps:10m rate=5r/s;
server {Listen 80;
server_name http://www.yoururl.com; #只对PHP的秒杀页面的请求进行限速 location ~ [^/]miaosha\.php (/|$) {#对应limit_conn_zone块 #限制每
The IP PHP page request concurrent quantity is 5 limit_conn Perip 5;
#对应limit_req_zone块 #限制每IP的每秒的PHP页面请求次数为上面定义的rate的值: 5 requests per second, no delay Limit_req ZONE=REQPS Nodelay; }
}
}
The above section of the Nginx configuration is actually a single IP limit, the effect is there, but not obvious enough.
2. Filter Invalid request:
The front end generates the signature string, For example, by Crypto.js, the current UNIX timestamp time, generate a random string nonce, and a key must be completed by the user to complete the verification code after the machine returned to the browser client a token name of the cookie field value (there is an expiration time) combined with the confusion algorithm generated, and then the custom signed The name algorithm generates the signature string signature on the front end, and finally brings up the 4 field information when the form is snapped up, and when the request arrives at Nginx, we use Nginx's Lua module to write Lua script validation signature correctness, and to limit the expiration of the above token to 30 seconds, and the client return time parameter must be the same as the server's UNIX timestamp difference of not more than 5 seconds, otherwise directly in the nginx of the LUA level directly shielded the request, where the face has to say openresty technology, Interested partners can go into the in-depth study.
3. Probabilistic Discard overload request:
Now that we've got a peak parameter peakmin in the previous concurrency test, we should try to make sure that all the valid second kill requests are not that much, first we have to get the total number of connections for the current Nginx Currentconnectioncount, When QPS reaches Peakmin, the number of connections we calculate is Peakminconnectioncount, then we use Nginx LUA module to get this value when the system load reaches 0.8*peakminconnectioncount , we will exceed the part of the 90% drop rate, return a failed second to kill the hint, and the user to the activity of the second kill to write the memcached cache record, when the system load to Peakminconnectioncount, we direct 100% discard requests, The front end according to the status code is 5XX to give the user failed seconds to kill the message prompts, of course, I would like to say is here must ensure that the user experience is normal. second question:
Functional Module Design system:
A mature electrical business system, generally will be divided into a number of relatively independent modules, such as Product center, Member Center, Order Center, Logistics Center, Configuration Center, Search Center, such as large modules, these large modules of the library table data is usually low coupling, so you can also split these large modules into a number of child function modules, This allows the entire electronic business system modules to minimize the impact of each other, where the distributed server architecture contains a lot of architectural practices, here is not detailed. Third question:
Cache Key Atomic authentication:
The same user repeatedly grabbed the quota this question is relatively simple, most likely is the user (robot program) in very short time (assumes is 0.01 seconds) submits more than 2 times the concurrent request, with productid+activityid+ The UserID named Key writes the record value of the user's successful second kill, writes the information using the memcached's add atomicity, and returns if the add error proves that it has been add once. Fourth question:
optimistic Lock:
Memcached has a very good CAS check mechanism, it's apart. I grabbed a quota first, to really the data to be saved when I see if the CAS value is the same as the first time, if not the same operation returned no seconds to kill the message prompted, or to lose a valid seconds to kill places, Until the save seconds to kill the inventory key is 0. Fifth question:
1. Hot and Cold multiple backup:
Whether it is the application server, caching server, database server, Message Queuing server, etc., should have their own multiple backups, especially the database server and caching server is directly affecting the system data level of things, conditional also need to do a good job in different places, many data centers and other infrastructure facilities.
2. Automated operation and maintenance:
The use of bulk management and configuration tools, such as Ansible,docker technology, where the bread contains more knowledge, I have not mastered this piece of technical practice, need to constantly hone ah.
Original: https://zhuanlan.zhihu.com/p/25466488?group_id=819684972054077440