Kudu Series: Java API usage and efficiency testing

Source: Internet
Author: User
Tags exception handling uuid

Kudu+impala is a good fit for data analysis, but inserting data directly into the Kudu table using the Insert Values statement is not very efficient, and testing the insert is only 80/sec. The reason is also obvious, the Kudu itself writes very efficiently, But Impala did not do this optimization, observing that each Impala statement executed by the overhead is too large, resulting in frequent small batch write efficiency is very poor, kudu is officially recommended to use the Java API or Python API to complete the data writing work. The following are test cases using the Java API, and you can see the approximate usage of the Kudu API.

=========================
Prepare the test table
=========================

--Kudu TableCREATE TABLEkudu_testdb.tmp_test_perf (ID string ENCODING plain_encoding COMPRESSION snappy,name string ENCODING dict_encoding COMPRESSION SNAPPY,PRIMARY KEY(ID)) PARTITION byHASH (ID) partitions6STORED asKudutblproperties ('Kudu.table_name' = 'Testdb.tmp_test_perf','kudu.master_addresses' = '10.0.0.100:7051,10.0.0.101:7051,10.0.0.101:7051','Kudu.num_tablet_replicas' = '1' ) ;

=========================
Writing test Java programs
=========================

 Packagekudu_perf_test;ImportJava.sql.Timestamp;ImportJava.util.UUID;Importorg.apache.kudu.client.*; Public classTest {Private Final Static intOperation_batch = 500; //three-mode test cases supported simultaneously     Public Static voidInserttestgeneric (kudusession session, kudutable table, Sessionconfiguration.flushmode mode,intRecordCount)throwsException {//SessionConfiguration.FlushMode.AUTO_FLUSH_BACKGROUND//SessionConfiguration.FlushMode.AUTO_FLUSH_SYNC//SessionConfiguration.FlushMode.MANUAL_FLUSHsession.setflushmode (mode); if(SessionConfiguration.FlushMode.AUTO_FLUSH_SYNC! =mode)        {session.setmutationbufferspace (Operation_batch); }        intUncommit = 0;  for(inti = 0; i < RecordCount; i++) {Insert Insert=Table.newinsert (); Partialrow Row=Insert.getrow (); UUID UUID=Uuid.randomuuid (); Row.addstring ("id", uuid.tostring ()); Row.addstring ("Name", Mode.name ());            Session.apply (insert); //for manual submission, the buffer needs to be flush when it is not full, which is submitted when half of the buffer is used.            if(SessionConfiguration.FlushMode.MANUAL_FLUSH = =mode) {Uncommit= Uncommit + 1; if(Uncommit > OPERATION_BATCH/2) {Session.flush (); Uncommit= 0; }            }        }        //for manual submission, make sure to complete the final submission        if(SessionConfiguration.FlushMode.MANUAL_FLUSH = = Mode && uncommit > 0) {Session.flush (); }        //for background autocommit, you must ensure that the final commit is completed and that you can throw an exception if there is an error        if(SessionConfiguration.FlushMode.AUTO_FLUSH_BACKGROUND = =mode)              {Session.flush (); Rowerrorsandoverflowstatus Error=session.getpendingerrors (); if(error.isoverflowed () | | error.getrowerrors (). length > 0) {                if(error.isoverflowed ()) {Throw NewException ("Kudu overflow Exception occurred."); } StringBuilder errormessage=NewStringBuilder (); if(Error.getrowerrors (). length > 0) {                     for(RowError errorObj:error.getRowErrors ()) {Errormessage.append (errorobj.tostring ()); Errormessage.append (";"); }                }                Throw NewException (errormessage.tostring ()); }        }    }    //only test cases that support manual flush     Public Static voidInserttestmanual (kudusession session, Kudutable table,intRecordCount)throwsException {//SessionConfiguration.FlushMode.AUTO_FLUSH_BACKGROUND//SessionConfiguration.FlushMode.AUTO_FLUSH_SYNC//SessionConfiguration.FlushMode.MANUAL_FLUSHSessionconfiguration.flushmode mode =SessionConfiguration.FlushMode.MANUAL_FLUSH;        Session.setflushmode (mode);        Session.setmutationbufferspace (Operation_batch); intUncommit = 0;  for(inti = 0; i < RecordCount; i++) {Insert Insert=Table.newinsert (); Partialrow Row=Insert.getrow (); UUID UUID=Uuid.randomuuid (); Row.addstring ("id", uuid.tostring ()); Row.addstring ("Name", Mode.name ());                        Session.apply (insert); //for manual submission, the buffer needs to be flush when it is not full, which is submitted when half of the buffer is used.Uncommit = uncommit + 1; if(Uncommit > OPERATION_BATCH/2) {Session.flush (); Uncommit= 0; }        }        //for manual submission, make sure to complete the final submission        if(Uncommit > 0) {Session.flush (); }    }       //only test cases that support auto flush     Public Static voidInserttestinautosync (kudusession session, Kudutable table,intRecordCount)throwsException {//SessionConfiguration.FlushMode.AUTO_FLUSH_BACKGROUND//SessionConfiguration.FlushMode.AUTO_FLUSH_SYNC//SessionConfiguration.FlushMode.MANUAL_FLUSHSessionconfiguration.flushmode mode =SessionConfiguration.FlushMode.AUTO_FLUSH_SYNC;                Session.setflushmode (mode);  for(inti = 0; i < RecordCount; i++) {Insert Insert=Table.newinsert (); Partialrow Row=Insert.getrow (); UUID UUID=Uuid.randomuuid (); Row.addstring ("id", uuid.tostring ()); Row.addstring ("Name", Mode.name ()); //for Auto_flush_sync mode, apply () completes the kudu write immediatelysession.apply (insert); }    }     Public Static voidTest ()throwskuduexception {kuduclient client=NewKuduclient.kuduclientbuilder ("10.0.0.100:7051,10.0.0.101:7051,10.0.0.101:7051"). build (); Kudusession Session=client.newsession (); kudutable Table= Client.opentable ("Testdb.tmp_test_perf");        Sessionconfiguration.flushmode mode; Timestamp D1=NULL; Timestamp D2=NULL; LongMillis; Longseconds; intRecordCount = 0; Try{mode=SessionConfiguration.FlushMode.AUTO_FLUSH_BACKGROUND; D1=NewTimestamp (System.currenttimemillis ());            Inserttestgeneric (Session, table, mode, recordCount); D2=NewTimestamp (System.currenttimemillis ()); Millis= D2.gettime ()-D1.gettime (); Seconds= millis/1000% 60; System.out.println (Mode.name ()+ "time-consuming number of seconds:" +seconds); Mode=SessionConfiguration.FlushMode.AUTO_FLUSH_SYNC; D1=NewTimestamp (System.currenttimemillis ());            Inserttestinautosync (Session, table, RecordCount); D2=NewTimestamp (System.currenttimemillis ()); Millis= D2.gettime ()-D1.gettime (); Seconds= millis/1000% 60; System.out.println (Mode.name ()+ "time-consuming number of seconds:" +seconds); Mode=SessionConfiguration.FlushMode.MANUAL_FLUSH; D1=NewTimestamp (System.currenttimemillis ());            Inserttestmanual (Session, table, RecordCount); D2=NewTimestamp (System.currenttimemillis ()); Millis= D2.gettime ()-D1.gettime (); Seconds= millis/1000% 60; System.out.println (Mode.name ()+ "time-consuming number of seconds:" +seconds); } Catch(Exception e) {//TODO auto-generated Catch blockE.printstacktrace (); } finally {            if(!session.isclosed ())            {Session.close (); }        }    }     Public Static voidMain (string[] args) {Try{test (); } Catch(kuduexception e) {//TODO auto-generated Catch blockE.printstacktrace (); } System.out.println ("Done"); }}

=========================
Performance Test Results
=========================
Manual_flush Mode: 8000 Row/second
Auto_flush_background Mode: 8000 Row/second
Auto_flush_sync mode: Row/second
Impala SQL Insert statement: Row/second

=========================
Kudu API Usage Summary
=========================
1. Try to use Manual_flush, the best performance, if there is a write kudu error, FLUSH () function will throw an exception, the logic is very clear.
2. Auto_flush_sync is also a good choice in situations where performance requirements are low.
3. Only use Auto_flush_background in demo scenario, the code can be very simple and good performance, regardless of exception handling. In the production environment, the reason is not recommended: the insertion of data may be chaotic, and once considered to catch the exception code is very procrastination.

Kudu Series: Java API usage and efficiency testing

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.