A detailed description of the Storm batch transaction API

Last Update:2015-12-26 Source: Internet

Author: User

Tags emit

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Before looking at this blog post, it is recommended to check

How storm Batch transactions work

Why batching (Batch)?

Process individual tuples one by one, adding a lot of overhead, such as writing libraries and outputting results too often

Transaction processing single tuple efficiency is low, so the batch processing is introduced into storm

Batches are processed one batch (batch) tuple at a time, and the transaction ensures that the batch is processed successfully, and if there are no processing failures, Storm will resend the failed batches and ensure that each batch is and is processed only once

There are three types of spout:

the following were:

1. Itransactionalspout<T>, with Basetransactionalspout<t>, General Affairs spout

2.ipartitionedtransactionalspout<t>, with Basepartitionedtransactionalspout<t>, partition transaction spout

3.iopaquepartitionedtransactionalspout<t>: Same as Baseopaquepartitionedtransactionalspout<t>, opaque partitioned transaction spout

There are two types of bolts

1.ibatchbolt<t>: with Basebatchbolt<t>, ordinary batch processing

2.BaseTransactionalBolt: Transaction Bolt

Implement Interface Icommitter: identifies whether Ibatchbolt or Basetransactionalbolt is a committer coordinatedbolt

Spout

1.Spout: General Affairs Spout Itransactionalspout

Interface Itransactionalspout.coordinator<x>:

Method Summary
`void`	`close()`
`X`	`initializeTransaction(java.math.BigInteger txid, X prevMetadata)`Creates a new metadata that, when IsReady () is true, emits the metadata (transaction tuple) to thebatch emit stream
`boolean`	`isReady() 1. 返回时true，开启一个事务进入processing阶段，发射一个事务性的tuple到batch emit流，Emitter以广播方式订阅Coordinator的batch emit流`

Interface itransactionalspout.emitter<x>

Method Summary
`void`	`cleanupBefore(java.math.BigInteger txid)`Cleanup information for previous transactions
`void`	`close()`
`void`	`emitBatch(TransactionAttempt tx, X coordinatorMeta, BatchOutputCollector collector)`2.Emitter receives this transaction tuple, the batch tuple is launched, and batch of tuple is fired individually

2.Spout: Partition transaction ipartitionedtransactionalspout<T>

Partition transaction spout , mainstream affairs spout because the current mainstream message Queue partitions are supported, the role of partitioning is to increase the MQ (each partition as a data source send point), the mainstream MQ such as Kafka , ROCKETMQ

Interface Ipartitionedtransactionalspout.coordinator

Method Summary
`void`	`close()`
`boolean`	`isReady()`Returns true to open a transaction into the processing phase, launching a transactional tuple-to-batch emit stream, emitter subscribing to coordinator batch emit stream in a broadcast manner
`int`	`numPartitions()`returns the number of partitions. When a new data source partition is added and a transaction is replayed , the new partition's tuples is not emittedbecause it knows how many partitions are in the transaction.

Interface ipartitionedtransactionalspout.emitter<x>

Summary
`void`	`close()`
`void`	`emitPartitionBatch(TransactionAttempt tx, BatchOutputCollector collector, int partition, X partitionMeta)`. if the message bolt consumption fails, Emitpartitionbatch is responsible for re-sending the message.
`X`	`emitPartitionBatchNew(TransactionAttempt tx, BatchOutputCollector collector, int partition, X lastPartitionMeta)`launch a new batch and return to metadata.

3.Spout: Opaque partitioned transaction Spout

Interface Iopaquepartitionedtransactionalspout

Method Summary
`void`	`close()`
`boolean`	`isReady()`Ditto

Iopaquepartitionedtransactionalspout it does not distinguish between sending new messages or re-sending old messages, all with Emitpartitionbatch. Although the x returned by Emitpartitionbatch should be the next batch for your own use (the 4th parameter of Emitpartitionbatch), only one batch will be updated to zookeeper, and if it fails to be re-sent, The x that emitpartitionbatch reads is still old. So this time the custom x does not need to record the start position of the current batch and the start position of the next batch two values, only need to record the next batch of starting position a value can be, for example:

public class Batchmeta {
Public long nextoffset;//offset for next batch
}

Ipartitionedtransactionalspout and Iopaquepartitionedtransactionalspout

is to package the tuple into batch for processing, and to ensure that each tuple is processed completely, support message re-send. To support transactional, they provide a unique transaction ID (transaction ID:TXID) for each batch (batch), TXID is sequentially incremented, and the processing of batches is guaranteed to be strong-ordered, that is, the txid=1 must be fully processed before the txid=2 can be processed again.

The difference and usage:

Each tuple of a ipartitionedtransactionalspout is bound to a fixed batch. No matter how many times a tuple is re-sent, it has the same transaction ID in the same batch, and a tuple does not appear in more than two batches. No matter how many times a batch is re-sent, it has only one and the same transaction ID, and does not change. This means that, regardless of how many times a batch is re-sent, it contains exactly the same content.

But Ipartitionedtransactionalspout will have a problem, although this problem is very rare: Suppose a batch of messages in the bolt consumption process failed, need to spout, at this time if you happen to encounter a message sent middleware failure, such as a partition is unreadable, spout in order to ensure that each batch contains the same tuple, it can only wait for the message middleware to recover, that is, the card can no longer be sent to the bolt message until the message middleware recovery. Iopaquepartitionedtransactionalspout can solve this problem.

and iopaquepartitionedtransactionalspout in order to solve this problem, it does not guarantee that each time a batch is re-sent the message contains a tuple of exactly the same. This means that a tuple may appear for the first time in a batch of txid=2, and may appear later in a txid=5 batch. This situation only occurs when a batch of message consumption fails to be re-sent and happens when the message middleware fails. At this point, iopaquepartitionedtransactionalspout is not waiting for the message middleware to recover, but to read the partition first. For example, the txid=2 batch failed in the consumption process, need to be re-sent, it happens that the message middleware 16 partitions have 1 partitions (partition=3) because the fault is unreadable. At this time Iopaquepartitionedtransactionalspout will read the other 15 partitions, complete the txid=2 this batch of send, this time the same batch actually contains fewer tuple. Assuming the failure of the message middleware is restored at txid=5, the tuple that was previously in txid=2 and partition=3 in the partition is re-sent, contained in the txid=5 batch.

When using Iopaquepartitionedtransactionalspout, because the corresponding relationship between the tuple and the TXID is likely to change, it is not guaranteed to be transactional to keep a txid with the business calculation results. This time the solution is a little more complicated, in addition to saving the business calculation results, there are two elements to save: the previous batch of business calculations and the transaction ID for this batch.

Let's take a simpler example of how to calculate global count, assuming that the current statistical results are:

{value = 4,

Prevvalue = 1,

TXID = 2

}

BOLT

I. Interface ibatchbolt<t> general Batch Processing

Method Summary
`void`	`execute(Tuple tuple)`Execute batch inside to process each tuple
`void`	`finishBatch()`Finish processing a batch call
`void`	`prepare(java.util.Map conf, TopologyContext context, BatchOutputCollector collector, T id)`Finish processing a batch call

Two. Class Basetransactionalbolt Transaction Batch Processing

The only difference between a transactional batch and a normal batch:

the fourth parameter of a prepare

Ibatchbolt prepare 里面的最后一个参数 is the object type, and the transaction is related to Transactionattempt

A detailed description of the Storm batch transaction API

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More