recently, the company's online business was migrated to the storm cluster, after the launch of the low peak period of CPU consumption serious situation. In the process of solving the problem in-depth understanding of storm's internal implementation principle, and resolved a storm0.9-0.10 version has been a serious bug, the current code has been incorporated into the new storm version, in this article will introduce the problem of the scene, analysis of ideas, Solution to the way and some personal gains. Background
First of all, a brief introduction to storm, the familiar classmate can skip this paragraph directly.
Storm is a big data-processing framework for Twitter's open source, focusing on streaming data processing. Storm transforms the data flow by creating a topology (topology). Unlike the job (job) of Hadoop, topology will continue to transform data unless it is shut down by the cluster.
is a simple storm topology structure diagram.
It can be seen that topology is a forward graph formed by the series/parallel of different components (Component). The data tuple (tuple) is passed in the form of a data flow between component. There are two types of component
At present, the industry is mainly offline or the real-time requirements of the business is not high-use storm. As the storm version changes, reliability and real-time are growing, with the ability to run online business. So we tried to move some of the real-time requirements to the storm cluster in a hundred-millisecond online business.
- Using top to see the CPU consumption, system calls occupy about 70%. Then use Wtool to analyze storm's work process and find the most CPU-occupying thread
Java.lang.Thread.State:TIMED_WAITING (parking) at Sun.misc.Unsafe.park (Native Method)-Parking to wait for <0x0000000640a248f8> (a java.util.concurrent.locks.abstractqueuedsynchronizer$conditionobject) at Java.util . Concurrent.locks.LockSupport.parkNanos (locksupport.java:215) at Java.util.concurrent.locks.abstractqueuedsynchronizer$conditionobject.await (AbstractQueuedSynchronizer.java : 2163) at Com.lmax.disruptor.BlockingWaitStrategy.waitFor (blockingwaitstrategy.java:87) at Com.lmax.disrupto R.processingsequencebarrier.waitfor (processingsequencebarrier.java:54) at Backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable (disruptorqueue.java:97) at Backtype.storm.disruptor$consume_batch_when_available.invoke (disruptor.clj:80) at Backtype.storm.daemon.executor $FN __3441$fn__3453$fn__3500.invoke (executor.clj:748) at Backtype.storm.util$async_loop$fn__464.invoke (UTIL.CLJ : 463) at Clojure.lang.AFn.run (AFN.Java:24) at Java.lang.Thread.run (thread.java:745)
We can see that these threads are waiting on the semaphore. The source of the call is disruptor$consume_batch_when_available.
Disruptor is the encapsulation of the storm internal message queue. So let's look at the message transfer mechanism inside storm.
(Image source Understanding the Internal Message buffers of Storm)
A storm's work node is called a worker (in fact a JVM process). Communication between different workers is via Netty (Legacy Storm using ZEROMQ).
Each worker contains a set of executor inside. Strom assigns a executor to each component in the topology. In the actual data processing flow, the information flows between executor in the form of messages. Executor loops through the processing of the bound component to process the received message.
The message transport between executor uses the queue as the message pipeline. Storm will assign two queues and two processing threads to each of the executor.
- Worker threads: Reads the receive queue, processes the message, and writes the send queue if a new message is generated
- Send thread: Reads the send queue, sends the message other executor
When the executor send thread sends a message, it determines whether the target executor is in the same worker, and if so, writes the message directly to the receiving queue of the target executor, and if not, writes the message to the worker's transmission queue, which is sent over the network.
The code for the executor work/send thread reads the queue as follows, where the consume-batch-when-available reads the messages in the queue and processes the messages.
(async-loop (fn [] ... (disruptor/consume-batch-when-available receive-queue event-handler) ... ))
- Let's take a look at what Consume_batch_when_available has done in this function.
(defn consume-batch-when-available [^DisruptorQueue queue handler] (.consumeBatchWhenAvailable queue handler))
As mentioned earlier, Storm uses queues as message pipelines. Storm is a streaming big data processing framework that is sensitive to the performance of message transmissions, and therefore uses an efficient memory queue disruptor queue as the message queue.
The disruptor queue is an unlocked memory queue for Lmax open source. The internal implementation is as follows.
(Picture source disruptor queue Introduction)
The disruptor queue manages queues through sequencer, sequencer internally using Ringbuffer to store messages. The location of the message in Ringbuffer is represented by sequence. The production and consumption process of the queue is as follows
- Sequencer uses a cursor to save the write location.
- Each consumer maintains a consumption location and registers to sequencer.
- Consumer interacts with sequencebarrier and sequencer. Consumer each time the consumption, Sequencebarrier will compare the consumption location and cursor to determine whether there is a message: if not, will follow the policy set to wait for the message, if there is, read the message, modify the consumption location.
- Producer will look at the consumption location of all consumers before writing, and will write a message to update the cursor when there is a location available.
View the disruptorqueue.consumebatchwhenavailable implementation as follows
final long nextSequence = _consumer.get() + 1;final long availableSequence = _barrier.waitFor(nextSequence, 10, TimeUnit.MILLISECONDS);if (availableSequence >= nextSequence) { consumeBatchToCursor(availableSequence, handler);}
Continue to view the _barrier.waitfor method
public long waitFor(final long sequence, final long timeout, final TimeUnit units) throws AlertException, InterruptedException { checkAlert(); return waitStrategy.waitFor(sequence, cursorSequence, dependentSequences, this, timeout, units);}
Disruptor Queue provides a number of message waiting strategies for consumers
- Blockingwaitstrategy: Blocking wait, CPU consumption is small, but will switch threads, high latency
- Busyspinwaitstrategy: Spin wait, high CPU consumption, but no need to switch threads, low latency
- Yieldingwaitstrategy: Spin Wait, then use Thread.yield () to wake up other threads, CPU usage and latency more balanced
- Sleepingwaitstrategy: Spin First, then Thread.yield (), Last Call to Locksupport.parknanos (1L), CPU usage and latency more balanced
Storm's default Wait policy is blockingwaitstrategy. Blockingwaitstrategy's WAITFOR function is implemented as follows
if ((availableSequence = cursor.get()) < sequence) { lock.lock(); try { ++numWaiters; while ((availableSequence = cursor.get()) < sequence) { barrier.checkAlert(); if (!processorNotifyCondition.await(timeout, sourceUnit)) { break; } } } finally { --numWaiters; lock.unlock(); }}
The Blockingwaitstrategy internally uses semaphores to block the consumer, and when the await times out, the consumer thread is automatically awakened, continuing to iterate through the available messages. The implementation here has a bug, in the processornotifycondition.await timeout should be circular query, but the code actually jumped out of the loop, the immediate return of the current cursor,
As you can see in the Disruptorqueue.consumebatchwhenavailable method, Storm here sets a timeout of 10ms. Presumably when there is no message or a small amount of information, executor in the consumer queue will be blocked, because the time-out is very short, the worker thread will frequently time out, plus the blockingwaitstrategy bug, Consumebatchwhenavailable are frequently called, resulting in high CPU usage.
Try to modify the 10ms to 100ms, compile the storm and redeploy the cluster, use Storm's demo topology, set the bolt concurrency to 1000, and modify the spout code to send a message to 10s. Tested CPU usage has been significantly reduced.
The 100ms is then changed to 1s, and the CPU consumption is basically reduced to zero.
But with the tuning, the test did not find that the message processing has a delay. Continue to view the Blockingwaitstrategy code and discover that the producer of Disruptor Queu wakes up waiting consumer after writing the message.
if (0 != numWaiters){ lock.lock(); try { processorNotifyCondition.signalAll(); } finally { lock.unlock(); }}
In this way, Storm's 10ms timeout is strange, without reducing the message latency, instead increasing the system load. Take this question to see the context of the code and find that there is a comment when constructing the Disruptorqueue object
;; :block strategy requires using a timeout on waitFor (implemented in DisruptorQueue), as sometimes the consumer stays blocked even when there‘s an item on the queue.(defnk disruptor-queue [^String queue-name buffer-size :claim-strategy :multi-threaded :wait-strategy :block] (DisruptorQueue. queue-name ((CLAIM-STRATEGY claim-strategy) buffer-size) (mk-wait-strategy wait-strategy)))
The disruptor queue version used by storm is 2.10.1. View the change log for the disruptor queue and discover that the version of Blockingwaitstrategy has potential concurrency problems that could cause a message to be written without waking the waiting consumer.
2.10.2 released (21-aug-2012)
- Fix the Bug, potential race condition in blockingwaitstrategy.
- Bug fix Set initial sequencegroup value To-1 (Issue #27).
- Deprecate timeout methods that would be is removed in version 3.
So storm uses a short timeout, so that when concurrency problems occur, consumers who are not awakened will quickly requery the available messages because of timeouts, preventing message delays.
This way, if the direct modification times out to 1000ms, the worst case scenario is that the message will delay 1000ms if there is a concurrency problem. After balancing performance and latency, we added configuration items to the storm's configuration file to modify the timeout parameters. This allows the user to choose whether to ensure low latency or low CPU usage.
Consulted the author of the disruptor queue on Blockingwaitstrategy's potential concurrency, and learned that 2.10. Version 4 has fixed this concurrency problem (Race condition in 2.10.1 release
)。
Upgrade storm dependencies to this release. However, the concurrency test of the 2.10.1 of the disruptor queue failed to reproduce the concurrency problem, so it was not possible to determine whether the 2.10.4 was completely repaired. As a precaution, the previous timeout configuration item is retained while the upgrade is dependent, and the default timeout is adjusted to 1000ms. It is tested that CPU usage is normal when the cluster is idle, and no message delay occurs when the test is pressed.