Build high-performance Services (III) Java High performance buffer design vs Disruptor vs linkedblockingqueue--reprint

Source: Internet
Author: User
Tags lock queue message queue

Original address: http://maoyidao.iteye.com/blog/1663193

A service that is deployed on only 4 servers, writes more than 1 million rows of data to database per second, producing more than 1G of data per minute. The CPU on each server (8 cores 12G) consumes less than 100%,load 5. How is this done? Here is a description of the architecture, the core of which is an efficient buffer design, and our requirements for it are:

1, the buffer should be as simple as possible

2, try to avoid producer thread and consumer line lock

3, try to avoid a large number of GC

Buffering vs Performance Bottlenecks

The silver bullet to increase hard disk write Io is undoubtedly batch sequential write, whether in the industry popular distributed file system or data, Hbase,gfs and HDFs, or the disk file as a persistent message queue Kafka adopt in-memory cache data and then batch write policy. The performance core of this strategy is the in-memory buffer design. This is a classic data generation and consumer scene, the requirements of the buffer is when synchronous write and read out: (1) Write full then do not write (2) read Empty is not read (3) do not lose data (4) Do not read the duplicate data. The most direct and common way is the linkedblockingqueue that comes with the JDK. Linkedblockingqueue is a locked message queue, written and read out with lock, fully full buffer above the four requirements. But when your program runs up, see if that thread consumes the most CPU? This is often the time to read Linkedblockingqueue locks on the thread, which is also a performance bottleneck for many programs that require high throughput.

Disruptor

Troubleshoot performance issues with lock queues? Disruptor is a choice. What is Disruptor? See how the Open source company Lmax itself:

We have spent a lot of effort to achieve higher performance queues, but it turns out that the queue as a basic data structure has its limitations-the merging of design issues between producers, consumers, and their data stores. Disruptor is the result of our creation of a data structure that clearly divides these concerns.

Ok,disruptor is used to solve the problem of our scenario, and it is not a queue. So what is it and how is it efficient? I do not do too much introduction, online similar information a lot, simple summary:

1,disruptor uses a ringbuffer substitution queue to replace the lock with the producer consumer Pointer.

2, the producer consumer Pointer uses a CPU-supported integer that is self-increasing without locking and fast. The implementation of Java is in the unsafe package.

Using Disruptor, you first need to build a ringbuffer and specify a size, noting that if the data inside the ringbuffer exceeds this size, the old data will be overwritten. This may be a risk, but Disruptor provides a mechanism to check whether the Ringbuffer is fully written to circumvent this problem. And according to the Maoyidao test results, the likelihood of full writing is small, because disrutpor is really efficient, unless your consuming thread is too slow.

and use a separate thread to process the data in the Ringbuffer:

Java code
  1. Ringbuffer Ringbuffer = new Ringbuffer<valueevent> (Valueevent.event_factory,
  2. new Singlethreadedclaimstrategy (ring_size),
  3. new Sleepingwaitstrategy ());
  4. Sequencebarrier barrier = Ringbuffer.newbarrier ();
  5. batcheventprocessor<valueevent> eventprocessor = new Batcheventprocessor<valueevent> (RingBuffer,  Barrier, handler);
  6. Ringbuffer.setgatingsequences (Eventprocessor.getsequence ());
  7. //Single thread
  8. new Thread (Eventprocessor). Start ();

Valueevent is typically a custom class that encapsulates your own data:

Java code
  1. Public class Valueevent {
  2. private byte[] packet;
  3. public byte[] GetValue ()
  4. {
  5. return packet;
  6. }
  7. public void SetValue (final byte[] packet)
  8. {
  9. this.packet = packet;
  10. }
  11. public final static eventfactory<valueevent> event_factory = new eventfactory<valueevent> ()  
  12. {
  13. Public valueevent newinstance ()
  14. {
  15. return new Valueevent ();
  16. }
  17. };
  18. }

The producer adds data to the buffer via the Ringbuffer.publish method, sending an event to inform the consumer that new data has been reached, and, notice how we circumvent the data coverage problem:

Java code
  1. Publishers claim events in sequence
  2. Long sequence = Ringbuffer.next ();
  3. If capacity less than 10%, don ' t use Ringbuffer anymore
  4. if (ringbuffer.remainingcapacity () < Ring_size * 0.1) {
  5. Log.warn ("disruptor:ringbuffer avaliable capacity is less than 10");
  6. //Do something
  7. }
  8. else {
  9. Valueevent event = ringbuffer.get (sequence);
  10. Event.setvalue (packet); //This could is more complex with multiple fields
  11. //Make the event available to Eventprocessors
  12. Ringbuffer.publish (sequence);
  13. }

The data consumer code is implemented in EventHandler:

Java code
  1. Final eventhandler<valueevent> handler = new eventhandler<valueevent> ()
  2. {
  3. public void OnEvent (final valueevent event, final long sequence, final boolean endofbatch) throws Exception
  4. {
  5. byte[] packet = Event.getvalue ();
  6. //Do something
  7. }
  8. };

Very well, done! Run a pressure test with the above code, the result is much faster than the lock queue (disruptor official online benchmark data, I do not provide comparative data). Well, use the online environment .... The result is ... The CPU has soared!??

Disruptor's pit.

The book connected to the above, disruptor good pressure measurement, but on-line CPU use reached 650%,load nearly 300! Analysis Diruptor Source, the cause of high CPU is Ringbuffer waiting strategy, disruptor official website example use the strategy is sleepingwaitstrategy, The strategy of this class is to check ringbuffer cursor every 1ns when no new data is written to Ringbuffer. 1ns! It's no different from the dead loop, so the CPU is high. Change to a 100ms check, and the CPU immediately drops to 7.8%.

Why disruptor Use this kind of sleepingwaitstrategy with such a risk in the official website example? The reason is that this policy does not use locks at all, and when the throughput is high, there is always data in the Ringbuffer, and the polling strategy maximizes its performance benefits. But this is obviously the ideal state, the Internet application has the obvious peak trough, cannot always be in full load state. So it's better to blockingwaitstrategy this lock notification mechanism:

Java code
    1. Ringbuffer Ringbuffer = new Ringbuffer<valueevent> (Valueevent.event_factory,
    2. new Singlethreadedclaimstrategy (ring_size),
    3. new Blockingwaitstrategy ());

This write does not lock, read and lock. The relative lock-up queue is less than half, and performance is significantly improved.

Is there a better way?

Disruptor is a good choice for implementing buffers. But its essence is to provide efficient implementation of exchanging data between threads, which is a good general choice. So true to our data asynchronous batch landing scene, there is no better choice? The answer is: Yes,we have! I finally designed a very simple buffer, for the following reasons:

1,disruptor very good, but after all introduced a dependency, for new students also have learning costs.

2,disruptor is not a good solution to the problem of too many GC.

So what's the better cache? That starts with the scene.

The first question is: I need a buffer, but why do you want a cross-thread buffer? If I use the same thread to read, and then use this thread to write, this buffer is entirely thread local buffer, the lock itself is meaningless. At the same time the asynchronous database landed no strict order requirements, so I am multithreaded synchronous read and write, do not need to set the buffer to maintain order, so a two-dimensional byte[][] array built into the thread can solve all problems!

Java code
  1. Public class Threadlocalboundedmq {
  2. private long lastflushtime=0l;
  3. private byte[][] msgs=new byte[constants.batch_ins_count][];
  4. private int offset=0;
  5. public byte[][] Getmsgs () {
  6. return msgs;
  7. }
  8. public void Addmsg (byte[] msg)
  9. {
  10. msgs[offset++]=msg;
  11. }
  12. public int size () {
  13. return offset;
  14. }
  15. public void Clear () {
  16. offset=0;
  17. Lastflushtime=system.currenttimemillis ();
  18. }
  19. Public Boolean Needflush () {
  20. return (System.currenttimemillis ()-lastflushtime > Constants.max_buffer_time)
  21. && offset>0;
  22. }
  23. }

The actual test and on-line effect is good (see the first section of this article)!

Summarize

Being able to use the most streamlined code to accomplish performance and business requirements is the perfect approach. Depending on the scenario, you can have a lot of assumptions, but don't be confused by dazzling new technologies and take your own services to do the mice, the most suitable, the simplest, is the best.

This article is Maoyidao original, reproduced please refer to the original link:

http://maoyidao.iteye.com/blog/1663193

The first 2 articles of this series are also recommended

Build high-performance services (i) Concurrentskiplistmap and linked lists build high-performance Java Memcached

http://maoyidao.iteye.com/blog/1559420

Building high-performance Services (II) 3 implementations of Java High concurrency lock

http://maoyidao.iteye.com/blog/1563523

Build high-performance Services (III) Java High performance buffer design vs Disruptor vs linkedblockingqueue--reprint

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.