Storm common patterns-batch processing

Source: Internet
Author: User
Tags ack

When storm stream data is processed in real-time, a common scenario is to process a certain number of tuple tuples together, rather than processing a tuple at once for each tuple received, which may be performance considerations or specific business needs.

For example, batch query or update database, if each tuple generates a SQL to perform a database operation, when the data is large, the efficiency is much lower than the batch processing, affecting the system throughput.

Of course, if you want to use storm's reliable data processing mechanism, you should use a container to cache these tuple references into memory until the batch processing, ack these tuple.

A simple code example is given below:

Now, let's say we already have a Dbmanager database operation interface class that has at least two interfaces:

(1) getconnection (): Returns a Java.sql.Connection object;

(2) GetSQL (tuple tuple): Generates database operation statements based on tuple tuples.

In order to cache a certain number of tuples in a bolt, the bolt is constructed by passing an int n parameter to the member variable int count of the bolt, specifying that each n tuple is processed in batches.

Also, in order to cache a tuple in memory, use Concurrentlinkedqueue in Java concurrent to store the tuple, triggering batch processing whenever the count of tuples is saved.

In addition, given the small amount of data (such as a long period of time without having enough count of tuples) or the number of count bars to be set too large, a timer is added to the bolt to ensure that the bulk processing tuple is processed at most 1 seconds.

The following is the complete code for the bolt (for reference only):

ImportJava.util.Map;ImportJava.util.Queue;ImportJava.util.concurrent.ConcurrentLinkedQueue;ImportJava.sql.Connection;ImportJava.sql.SQLException;ImportJava.sql.Statement;ImportBacktype.storm.task.OutputCollector;ImportBacktype.storm.task.TopologyContext;ImportBacktype.storm.topology.IRichBolt;ImportBacktype.storm.topology.OutputFieldsDeclarer;ImportBacktype.storm.tuple.Tuple;PublicClass BatchingboltImplementsIrichbolt {PrivateStaticFinalLong Serialversionuid = 1L;PrivateOutputcollector collector;Private queue<tuple> Tuplequeue =New concurrentlinkedqueue<tuple>();PrivateIntCountPrivateLongLasttime;PrivateConnection Conn;Public Batchingbolt (IntN) {count = n;//Number of tuple record bars processed in batches conn = Dbmanger.getconnection ();//Get database connection via Dbmanager Lasttime = System.currenttimemillis ();//Timestamp of last batch processing} @OverridePublicvoidPrepare (Map stormconf, Topologycontext context, Outputcollector collector) {This.collector =Collector } @OverridePublicvoidExecute (tuple tuple) {tuplequeue.add (tuple);Long currenttime =System.currenttimemillis ();//One per count tuple is submitted in bulk, or every 1 secondsif (Tuplequeue.size () >= count | | currenttime >= lasttime + 1000) {Statement stmt =Conn.createstatement (); Conn.setautocommit (False);for (int i = 0; I < count; i++) {Tuple Tup =(Tuple) Tuplequeue.poll (); String sql = Dbmanager.getsql (TUP);//Generate SQL Statement stmt.addbatch (SQL);//Join SQL Collector.ack (TUP);//Make an ACK} stmt.executebatch ();//Bulk Commit SQLConn.commit (); Conn.setautocommit (true); System.out.println ("Batch insert data into database, Total records:" + count); Lasttime = CurrentTime;}} @Override public voidpublic void Declareoutputfields (Outputfieldsdeclarer declarer) {} @Override public map<string , Object> Getcomponentconfiguration () {// TODO auto-generated Method stub return null; }} 

Storm common patterns-batch processing

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.