Original: http://weibo.com/p/1001603862417250608209#_loginLayer_1436489552567
In this paper, we analyze three cases, and introduce the concurrent access of shared resources, computing-intensive task cache access, single hotspot resource peak traffic problem and solution respectively.
Q1: Booking system, a train ticket only, assuming that there are 1w individuals at the same time Open 12306 website to book tickets, how to solve the concurrency problem?
A1: First introduce the database level of concurrent access, the solution is mainly optimistic lock and pessimistic lock.
Optimistic lock
Assume that there are no concurrency conflicts and only check for data integrity violations when committing the operation.
Optimistic lock uses a self-increment field to represent the version number (or timestamp) of the data, and when the update is checked for the same version number, such as the database version number is 4, the version number is updated with the version number version=5, and the database version number version+1= (5) to compare, If they are equal, they can be updated, and if not equal, other programs have updated the record, returning an error.
Pessimistic lock
It is assumed that concurrency conflicts will occur, shielding everything that could violate the full line of data.
It is generally necessary to use the lock mechanism of the database, such as the row-level lock of the MYSQLINNODB engine.
Conclusion: In the actual production environment, if the concurrency is not large and dirty reads are not allowed (the original data is 5,ab two transactions, b Other Transactions update data is 2, the transaction is not committed, a is still read to 5), pessimistic locks can be used. When concurrent access is large, there is a very large performance problem with pessimistic locks and you can choose optimistic locks.
Secondly, introduce the CAS mechanism of memcached
CAS, also known as Compare-and-swap, represent an atomic operation.
Memcached's CAS mechanism solves the problem and its principle:
1. The Check-and-set atom operation function is realized;
2. Its use is: First use the Get command a key-value and key corresponding to the version number of the value, followed by the operation to generate a new value value, and finally use the CAS instruction to resubmit the key-value, with the newly obtained version number;
3. When the service side determines that the version number in the CAS operation is not up-to-date, it is assumed that the value of the change key has been modified and the CAS operation fails. The program designer can realize self-increment and self-reduction atomic operation through CAS mechanism;
You can see that the CAS mechanism of memcache and the optimistic lock implementation of database are very similar.
Q2: Suppose the picture in the system is stored in TFS (Taobao File System), the interface provides the thumbnail service, first looks for the thumbnail in the cache, if not, loads the original picture from TFS, then requests the thumbnail service, and after the thumbnail calculation is complete, it is set back to the cache service.
Problems: When a picture is shared with a 100w person, there are 1w concurrent requests at the same time, because the thumbnail calculation takes longer (assuming 1s), within this 1s, each request query cache is not found and then applies to calculate the thumbnail, resulting in repeated thumbnail calculation and resource consumption.
A2: For thumbnails this time-consuming service, is very suitable for the use of caching, but in use, for the same picture, in principle, only need to calculate a thumbnail, when the thumbnail is not completed, you can make an extra mark for each picture to indicate that it is processing, When a concurrent request encounters a thumbnail processing, it can wait for the thumbnail calculation to complete (this is the recommended way) to read directly from the cache, or to directly return an error, which is resolved by a client retry.
In this case, thumbnails can be pre-computed in the background and stored in the cache if the thumbnail request does not occur for 1 minutes after uploading the image. The other is to calculate the thumbnail image when uploading the image, but will increase the time to upload pictures.
Q3: A single point of peak traffic, in concurrent systems, in addition to the overall demand for high concurrency, but also common single hotspot resources of the high number of concurrent requests. For example, 10,000 people each share a picture, of which 9999 pictures of thumbnail request within the ten QPS, the remaining picture for the news hot picture, peak request at 100,000 QPS around, the system will encounter the capacity problems include: 1) interface front-end machine capacity is not enough; 2) the cache resource Single instance encounters a bottleneck.
A3: The solution is as follows for the performance bottlenecks that may be encountered for single-point peak traffic.
1) interface layer capacity is not enough: This problem is relatively simple, as long as the interface layer design is stateless, when the capacity to reach the early warning line, can be solved by rapid horizontal expansion.
2) cache Resource Single instance encounters a performance bottleneck: If you are using distributed cache, when you want to break through the access bottleneck of a single key (this bottleneck can be either CPU resource tight, single-machine network bandwidth is full, and possibly disk IO throughput is not enough), One approach is the distributed cache long copy (x3) Redundancy design, so that the system throughput (x3) can be increased by 3 times times, but the cost is increased 3 times times. Another approach is to focus on extreme hot data, in addition to distributed cache, while opening localcache on the front-end machine, relying on a large number of front-end machines to resist extreme hot requests.
[Reprint] Frequently asked questions in high-concurrency systems