Java read LEVEL-1 market dbf Files Extreme Optimization (3)

Source: Internet
Author: User
Tags cas

The recent construction of a project to achieve market access and distribution, the need to achieve the ultimate low latency characteristics, which is very important for the security system. Access to the market source is configurable, either LEVEL-1 or Level-2 or other third-party sources. Although LEVEL-1 market is not Level-2 fast, but as a system to support the market source, we still need to optimize it, so that from the file read, to the user through the socket to receive the market, end-to-end latency as low as possible. This paper mainly introduces the extreme optimization scheme of LEVEL-1 market dbf file reading. Believe that the other DBF file read should also have reference significance.

LEVEL-1 market is by the market station, timed every few seconds to the DBF file (Shanghai is show2003.dbf, Shenzhen is sjshq.dbf) update again, with the new market replaced the old. Our goal is to read the files into memory in the shortest amount of time after the new files have been updated, convert each row into objects, and convert each column to the corresponding data type.

We have adopted 6 optimization methods altogether.

We have described the 4 optimization strategies in the Java read Level-1 (1), "Java reading LEVEL-1 market dbf File Extreme Optimization (2)".

Optimization one: Using memory hard disk (RamDisk)
Optimization two: Using Jnotify, with notifications instead of polling

Optimization three: Using NIO to read files
Optimization Four: Reduce memory redistribution and GC when reading files

Optimization five: Field read optimization

Market DBF file Many fields are price type fields, with 2-bit or 3-bit decimals, after reading them from DBF, we will save them in a long type or int type, instead of a float or double type, such as 1.23, converted to 1230 save. Because the float type or double type can lose precision.

If not optimized, the read steps are:

1, read from byte[] corresponding to the offset and save to string.

2, trim operation on string

3, convert string to float type

4, multiply the float type by 1000 and strongly convert to long type.

Needless to say, the above process must be inefficient, the first two steps of light involves 2 string copies, 2 times object creation. The third step is not efficient. I am here by optimizing, adding a get_long_efficiently_and_multiply_1000 method to the Dbfreader.java, combining 4 steps into one step, with a single scan to get the results.

 Public Longget_long_efficiently_and_multiply_1000 (byte[] SRC,Final intindex) {        LongMultiplicand = 3; LongResult =0; Field field=getfields () [index]; BooleanIn_decimal_part =false; BooleanNegative =false; intoffset =Field.getoffset (); intLength =field.getlength (); intEnd = offset+length;  for(intI =field.getoffset (); i< end; i++)        {            byteCH =Src[i]; if(ch>=48 && ch<=57)//if it's a number{result*= 10; Result+ = ch-48; if(In_decimal_part) Multiplicand--; if(multiplicand==0) Break; Continue; }                        if(ch==32)//if it's a space                Continue; if(ch = = 46)//if it's a decimal point{In_decimal_part=true; Continue; }                        if(ch = = '-')//if it's a minus sign{Negative=true; }                        Throw Newnumberformatexception (); }                if(Multiplicand = = 3) Result*= 1000; Else if(Multiplicand = = 2) Result*=100; Else if(Multiplicand = = 1) Result*=10; if(negative) {result= 0-result; }                returnresult; }

The algorithm above is responsible for reading the field converted to a number while multiplying it by 1000. And the execution steps are optimized as much as possible in the code.

For the shaping of the read, we also optimized to add a get_long_efficiently:

 Public Longget_long_efficiently (byte[] SRC,Final intindex) {        LongResult =0; BooleanNegative =false; Field field=getfields () [index];  for(intI =field.getoffset (); i< Field.getoffset () + field.getlength (); i++)        {            byteCH =Src[i]; if(ch>=48 && ch<=57)//if it's a number{result= result*10 + (src[i]-48); Continue; }                        if(src[i]==32)//if it's a space                Continue; if(ch = = '-')//if it's a minus sign{Negative=true; }                        Throw Newnumberformatexception (); }                if(negative) {result= 0-result; }                returnresult; }

The above 2 algorithms are not complex, but critical, a DBF file contains approximately 5000 rows, each containing a field of type 20~30 float or int, which involves reading 100,000 + fields. Test down, this step improves reading speed from 50ms-70ms to 15ms to 20ms, details in the devil, this is the fastest speed improvement of an optimization.

(optimized five of the code in the improved Dbfreader, the morning is already available for download, here is a download link: Dbfreader library)

Optimization Six: thread pool parallel processing

With more than 5,000 rows of fields read and converted into objects, multithreading is the most natural way to optimize.

The general approach is to divide the task into blocks of equal parts, each of which handles a chunk. For example, if you take 5 threads, divide 5000 rows into 1000 rows, one for each thread. This looks fair, it is not, because our operating system is the time-sharing operating system, each thread began to work, the CPU time slices, and the intensity of the task is not exactly the same. The averaging is seemingly average, but it is likely that some threads will be able to get the job done, while others are still unfinished.

Here's an example of how I like to assign a task: each thread requests a small chunk from a 5,000-line task, such as 16 rows, and then applies 16 rows after completion. This fast thread will work more and work less slowly until all the rows have been processed. So, how do these threads coordinate, does the task assignment use locks? Without a lock, we can do it using CAS (Atomicinteger,atomicinteger is based on CAs), and there's not much to explain here. Look at the code:

classReadertaskImplementsRunnable {Collector Collector; List<byte[]>recordlist;        Countdownlatch Countdownlatch;        Atomicinteger cursor;        Dbfreader reader;  PublicReadertask (Collector Collector, Dbfreader Dbfreader, list<byte[]>recordlist, Atomicinteger cursor, Countdownlatch countdownlatch) {             This. Collector =collector;  This. Reader =Dbfreader;  This. recordlist =recordlist;  This. cursor =cursor;  This. Countdownlatch =Countdownlatch; } @Override Public voidrun () {Try {                intLength =recordlist.size ();  Do {                    Final intStep = 16; //Assign 16 rows per assignment to the thread.                     intEndIndex =Cursor.addandget (step); intStartIndex = EndIndex-step;  for(inti = StartIndex; I < EndIndex && i < length; i++) {                        byte[] row =Recordlist.get (i); Marketrealtimedata Shdata=NewMarketrealtimedata ();                        Shdata.setmarkettype (Constants.market_sh_stock); Shdata.setidnum (reader.get_string_efficiently (Row,0)); Shdata.setprefix (reader.get_string_efficiently (Row,1)); Shdata.setprecloseprice (reader.get_long_efficiently_and_multiply_1000 (Row,2)); Shdata.setopenprice (reader.get_long_efficiently_and_multiply_1000 (Row,3)); Shdata.setturnover (reader.get_long_efficiently_and_multiply_1000 (Row,4)); Shdata.sethighprice (reader.get_long_efficiently_and_multiply_1000 (Row,5)); Shdata.setlowprice (reader.get_long_efficiently_and_multiply_1000 (Row,6)); Shdata.setmatchprice (reader.get_long_efficiently_and_multiply_1000 (Row,7)); //read all field, omit several lines below//... ...                        //... ...                        if(Collector! =NULL) {collector.collect (shdata); }                    }                }  while(Cursor.get () <length); } finally {                if(Countdownlatch! =NULL) Countdownlatch.countdown (); }        }    }

Private voidReadhangqingfile (string path, string name)throwsException {//Long t1 = system.nanotime ();Dbfreader Dbfreader_sh =NULL; Try{dbfreader_sh=NewDbfreader (NewFile (Path+file.separator +name)); List<byte[]> List_sh =dbfreader_sh.recordswithoutdel_efficiently (CacheManager); Atomicinteger cursor=NewAtomicinteger (0);//atomic variables for assigning tasks between threadsCountdownlatch Countdownlatch =NewCountdownlatch (Work_thread_count);  for(inti = 0; i < work_thread_count-1; i++) {//Assigning a task to a thread pool multiple threadsReadertask task =NewReadertask (Collector, dbfreader_sh, list_sh, cursor, countdownlatch);                Globalexecutor.execute (Task); }                NewReadertask (Collector, dbfreader_sh, list_sh, cursor, countdownlatch). Run ();//The current thread itself also acts as a worker threadcountdownlatch.await (); //Long t2 = System.nanotime (); //System.out.println ("Speed time on Read and object:" + (T2-T1));            } finally {                if(Dbfreader_sh! =NULL) Dbfreader_sh.close (); }        }

The test shows that the processing time is shortened from 15ms-20ms to 4ms-7ms in the case of parallel processing with 4 threads.

Using all of the optimization methods described in this article, the entire read efficiency is optimized from more than 300ms to 5ms-10ms. We are talking about the total time taken from the start of a file update to the completion of a file read, the completion of more than 5,000 objects, and the conversion of 100,000 fields.

If we go further, we may have a lot of details to improve. Tests show that the stability of the delay is not good enough, probably due to the GC, we can also reduce the creation of objects to reduce performance loss, reduce GC, and control the GC execution time, so that the GC in the idle time to perform such aspects of optimization.

Binhua Liu original article, reprint please specify the original address http://www.cnblogs.com/Binhua-Liu/p/5616761.html

Java read LEVEL-1 market dbf Files Extreme Optimization (3)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.