Implementation of an efficient lock-free memory queue

Last Update:2018-12-03 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Disruptor is an efficient Memory Lock-free queue open-source by Lmax. I have read the relevant design documents and blogs over the past two days. I will try to summarize them below.

The first part. Introduction

When talking about concurrent program design, there are several concepts that cannot be avoided.

1. locks: locks are the easiest way to implement concurrency. Of course, the price is also the highest. During kernel-state locks, the operating system needs to perform a context switch. The thread waiting for the lock will be suspended until the lock is released. During context switching, the commands and data cached before the CPU become invalid, causing a high performance loss. Although user-mode locks avoid these problems, they are only effective when there is no real competition. The following is a comparison of the performance between non-locking, locking, Cas, and volatile variables defined in a counting experiment.

2. CAS: the meaning of CAS is not described much. Unlike locking, CAS requires a context switch, but the processor also needs to lock its command line to ensure atomicity and add memory barrier to ensure that the results are visible.

3. Memory barrier: we all know that modern CPUs are executed in disorder, that is, the program sequence may be different from the actual execution sequence. This is not a problem when a single thread is executed, but in a multi-threaded environment, such disorder may have a great impact on the execution results. Memory barrier provides a means to control the execution sequence of a program. For more information, see http://en.wikipedia.org/wiki/Memory_barrier.

4. cache line: the cache line is actually very simple, that is, the CPU has a minimum cache unit during cache, and the data in the same unit is loaded into the cache at the same time, the full use of cache line can greatly reduce data read/write latency. Improper use of cache line can also lead to different cache replacements and repeated failures.

Okay. Let's talk about the issues that need to be considered when designing concurrent memory queues. The first is the data structure, whether to choose a fixed-length array or a variable linked list, and the second is the concurrency control problem, whether to use a lock or CAS operation, whether to use a coarse-grained lock or separate control the headers, tails, and capacities of the queue. Even if they are separated, can they be prevented from falling into the same cache line.

Let's look back at the application scenarios of the queue. Generally, our processing will form a pipeline or graph structure. The queue is used as the link between these processes to indicate the dependencies between them, and play a buffer role at the same time. However, using queues does not have no cost. In fact, data queuing and departure are time-consuming, especially in scenarios with extremely high performance requirements. If this dependency can be expressed by not placing a queue between various processes, that's okay!
Part 2 Text

Now let's start to introduce our disruptor. With so many preparations above, I want to go straight to the topic. Next, let's analyze the three basic problems of the queue in detail.

1. How to store elements in the queue?

Disruptor's central data structure is a ring queue based on a fixed-length array, 1.

You can pre-allocate space when creating a group. when inserting a new element, you only need to copy the new element data to the allocated memory. Access to array elements is very friendly to CPU cache. We know that the remainder operation will be used in the ring queue, and the remainder operation is not efficient on most processors. Therefore, the array size can be set to an exponential multiple of 2, so that the actual index can be obtained by bitwise operation index & (size-1) to calculate the remainder.

Disruptor has only one external variable, that is, the subscript of the element at the end of the team: cursor, which also avoids operations and coordination between the two variables head/tail. Access to disruptor by producers and consumers must be coordinated by producer barrier and consumer barrier. What are the two barriers.

Figure 1. ringbuffer, the current team-End Element location is 18

2. How do producers Insert elements into the queue?

The producer inserts an element in two steps. The first step is to apply for an empty slot. Each slot will only be occupied by one producer. The producer applying for an empty slot will copy the data of the new element to the slot; the second step is release. After the release, new elements can be seen by consumers. If there is only one producer, the first step of application can be completed without synchronization. If there are multiple producers, there will be a variable: claimsequence to record the application location. The application must be synchronized through CAS. in example 2, if both producers want to apply for slot No. 19th, then they will execute CAS (& claimsequence, 18, 19) at the same time. If the slot is successfully executed, the other needs to apply for the next available slot. In disruptor, the order of successful publishing is strictly the same as the order of application. In implementation, a publishing event actually modifies the value of cursor, and the operation is equivalent to CAS (& cursor,
Myslot-1, myslot), from the operation can also be seen that the order of successful release must be slot, slot 1, slot 2 .... Strictly ordered. In addition, to prevent producers from producing too quickly and overwrite the consumer data in a circular queue, the producer must track the consumption of the consumer, and read the current consumption position of each consumer. For example, if the size of a ring queue is 8 and two consumers consume elements 13th and 14 respectively, the new elements produced by the producer cannot exceed 20. The process of inserting elements is shown as follows:

Figure 2. The current team tail position of ringbuffer is 18. The producer applies.

Figure 3. The producer application obtains the position No. 19th, Which is exclusive and can be written to the production element. Element 19 is invisible to consumers.

Figure 4. After the producer successfully writes data to location 19, the cursor is changed to 19 to complete the release. Then, the consumer can consume element 19.

3. How do consumers know that new elements are in?

The consumer needs to wait for a new element to enter to continue consumption, that is, the cursor is greater than the current consumption location. There are multiple waiting policies. You can choose sleep wait, busy spin, and so on. When using disruptor, you can select different wait policies based on the scenario.

4. Batch

If the consumer finds that the cursor has more than one forward position than its last consumption position, it can choose to consume the elements in this segment in batches, rather than one forward. This approach improves throughput while making the system latency smoother.

5. Dependency Graph

As mentioned above, in traditional systems, queues are usually used to represent dependencies between multiple processing flows, and one more queue needs to be added for One-Step dependency. In disruptor, because producers and consumers are separately considered and controlled, it is possible to express all dependencies through a core ring queue, which can greatly increase throughput and reduce latency. Of course, to achieve this goal, you must design it carefully. The following is a simple example to illustrate how to use disruptor To represent dependencies.

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849 /*** Scenario: the data produced by the producer P1 must be processed by the consumer EP1 and EP2, and then transmitted to the consumer EP3.** -----* ----->| EP1 |------* | ----- |* | v* ---- -----* | P1 | | EP3 |* ---- -----* | ^* | ----- |* ----->| EP2 |------* -----*** Queue-based solutions* ============

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Implementation of an efficient lock-free memory queue

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Implementation of an efficient lock-free memory queue

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support