Storm common mode-Batch Processing

Source: Internet
Author: User

When storm processes streaming data in real time, a common scenario is to process a certain number of tuple tuples in batches, instead of processing a tuple immediately every time a tuple is received. This may be a performance consideration, or the specific business needs.

For example, to query or update a database in batches, if each tuple generates an SQL statement to execute a database operation, the efficiency will be much lower when the data volume is large, affecting the system throughput.

Of course, if you want to use storm's reliable data processing mechanism, you should use containers to cache the references of these tuple into the memory until the tuple is processed in batches.

The following is a simple example:CodeExample:

Now, suppose we already have a dbmanager database operation interface class, which has at least two interfaces:

(1) getconnection (): returns a java. SQL. connection object;

(2) getsql (tuple): Generate database operation statements based on tuple tuples.

To Cache a certain number of tuple in bolts, when constructing bolts, the int n parameter is passed to the int count member variable assigned to bolts, and each n tuple is specified for batch processing.

At the same time, to cache tuple in the memory, the concurrent1_queue in Java concurrent is used to store tuple. Each time count tuple is collected, batch processing is triggered.

In addition, because the data volume is small (for example, the Count tuple is not enough for a long time) or the count is too large, a timer is added to bolt, ensure that tuple can be processed at most once every 1 second.

The following is the complete bolt code (for reference only ):

 Import  Java. util. Map;  Import  Java. util. Queue;  Import  Java. util. Concurrent. concurrent1_queue;  Import  Java. SQL. connection;  Import  Java. SQL. sqlexception;  Import  Java. SQL. statement; Import  Backtype. Storm. task. outputcollector;  Import  Backtype. Storm. task. topologycontext;  Import  Backtype. Storm. topology. irichbolt;  Import  Backtype. Storm. topology. outputfieldsdeclarer;  Import  Backtype. Storm. tuple. tuple;  Public   Class Batchingbolt Implements  Irichbolt { Private   Static   Final   Long Serialversionuid = 1l ;  Private  Outputcollector collector;  Private Queue <tuple> tuplequeue = New Concurrent1_queue <tuple> ();  Private   Int  Count;  Private  Long  Lasttime;  Private  Connection conn;  Public Batchingbolt ( Int  N) {count = N; //  Number of tuple records processed in batches Conn = dbmanger. getconnection (); //  Get database connection through dbmanager Lasttime = system. currenttimemillis (); //  Timestamp of the last batch processing } @ Override  Public   Void  Prepare (MAP stormconf, topologycontext context, outputcollector collector ){  This . Collector = Collector;} @ override  Public   Void  Execute (tuple) {tuplequeue. Add (tuple );  Long Currenttime = System. currenttimemillis ();  // Each Count tuple is submitted in batches, or once every 1 second.          If (Tuplequeue. Size ()> = count | currenttime> = lasttime + 1000 ) {Statement stmt = Conn. createstatement (); Conn. setautocommit (  False  );  For ( Int I = 0; I <count; I ++ ) {Tuple Tup = (Tuple) tuplequeue. Poll (); string SQL = Dbmanager. getsql (Tup ); // Generate SQL statements Stmt. addbatch (SQL ); //  Add SQL Collector. Ack (Tup ); //  ACK  } Stmt.exe cutebatch ();  //  Batch submit SQL statements  Conn. Commit (); Conn. setautocommit (  True  ); System. Out. println ( "Batch insert data into database, total records:" + Count); lasttime =Currenttime ;}@ override  Public   Void  Cleanup () {}@ override  Public   Void  Declareoutputfields (outputfieldsdeclarer declarer) {}@ override  Public Map <string, Object> Getcomponentconfiguration (){  //  Todo auto-generated method stub          Return   Null ;}

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.