Java read Level-1 quotes dbf File optimization (3), level-1dbf

Source: Internet
Author: User

Java read Level-1 quotes dbf File optimization (3), level-1dbf

Recently, a project was constructed to achieve market access and distribution, which requires extreme low latency. This is very important for the securities system. The access line source can be configured, either Level-1 or Level-2 or another third-party source. Although the Level-1 market is not as fast as Level-2, we still need to optimize it as the line source supported by the system so that the user can read the file and receive the quotation through socket, the end-to-end latency is as low as possible. This article describes the optimal solution for reading level-1 dbf Files. I believe that reading other dbf Files should also be of reference significance.

Level-1 market is a small market site. It regularly updates the dbf File (show2003.dbf in Shanghai and sjshq. dbf in Shenzhen) every few seconds and replaces the old one with the new market. Our goal is to read the file to the memory in the shortest time after the new file is updated, convert each row to an object, and convert each column to the corresponding data type.

We have adopted six optimization methods in total.

In the above article "optimization of dbf file for Java reading Level-1 (1)" and "optimization of dbf file for Java reading Level-1 (2, four optimization strategies have been introduced:

Optimization 1: Use a memory hard drive (RamDisk)
Optimization 2: Use JNotify instead of polling with notifications

Optimization 3: using NIO to read files
Optimization 4: Reduce memory allocation and GC when reading files

Optimization 5: Field read Optimization

Quote dbf File many fields are price fields with two or three decimal places. After reading them from dbf, we will save them in the Long or Int type, instead of Float or Double type, such as 1.23, convert to 1230 and save. Because Float or Double type will lose precision.

If not optimized, the read steps are as follows:

1. Read from the offset corresponding to byte [] and save it to String.

2. trim the String operation.

3. Convert String to Float type

4. Multiply the Float type by 1000 and convert it to the Long type.

 

Needless to say, the above process must be inefficient. Two string copies and two object creation are involved in the first two steps. Step 3 is inefficient. Here, through optimization, add a get_long_efficiently_and_multiply_1000 method in DBFReader. java, merge the four steps into one step, and obtain the result through one scan.

Public long get_long_efficiently_and_multiply_1000 (byte [] src, final int index) {long multiplicand = 3; long result = 0; Field field = getFields () [index]; boolean in_decimal_part = false; boolean negative = false; int offset = field. getOffset (); int length = field. getLength (); int end = offset + length; for (int I = field. getOffset (); I <end; I ++) {byte ch = src [I]; if (ch> = 48 & ch <= 57) // if it is a number {result * = 10; result + = ch-48; if (in_decimal_part) multiplicand --; if (multiplicand = 0) break; continue ;} if (ch = 32) // if it is a space continue; if (ch = 46) // if it is a decimal point {in_decimal_part = true; continue ;} if (ch = '-') // if it is a negative sign {negative = true;} throw new NumberFormatException ();} if (multiplicand = 3) result * = 1000; else if (multiplicand = 2) result * = 100; else if (multiplicand = 1) result * = 10; if (negative) {result = 0-result ;} return result ;}

 

The preceding algorithm converts a field to a number and times it by 1000. The execution steps are optimized as much as possible in the code.

 

We also optimized the reading of integer data by adding a get_long_efficiently:

Public long get_long_efficiently (byte [] src, final int index) {long result = 0; boolean negative = false; Field field = getFields () [index]; for (int I = field. getOffset (); I <field. getOffset () + field. getLength (); I ++) {byte ch = src [I]; if (ch >=48 & ch <= 57) // if it is a number {result = result * 10 + (src [I]-48); continue;} if (src [I] = 32) // if it is a space continue; if (ch = '-') // if it is a negative number {negative = true;} throw new NumberFormatException ();} if (negative) {result = 0-result;} return result ;}

 

The above two algorithms are not complex, but they are very critical. A dbf file contains about 5000 rows, and each line contains 20 ~ 30 Float or Int fields. This optimization involves reading more than 0.1 million fields. After testing, this step improves the read speed from 50ms-70ms to 15ms to 20 ms. The details are among the devil, which is the fastest improvement.

(Optimization 5 code has been provided for download in the improved DBFReader in the morning. Here we will provide the download link: DBFReader Library)

 

Optimization 6: Concurrent processing of Thread Pool

To read fields from more than 5000 rows and convert them to objects, multithreading is the most natural optimization method.

 

Generally, we use a method to divide tasks into equal parts, and each thread processes a large part. For example, if five threads are used for processing, 5000 rows are divided into 1000 rows, and each thread processes one. This seems fair. In fact, it is not because our operating system is a time-sharing operating system. The time when each thread starts to work, the CPU time slice used, and the task intensity are not exactly the same. The equals method seems to be average, but it is likely that some threads have finished their work, and some others have not finished.

 

Here we will introduce a job allocation method that I like: Each thread applies for a small piece of data from a job of 5000 rows each time, for example, 16 rows. After the job is completed, apply for 16 rows again. In this way, the fast thread will work more, and the slow thread will work less until all the rows are processed. So how are these threads coordinated? Isn't the lock necessary for task allocation? No locks are needed, so we can use the CAS mechanism (actually using AtomicInteger, and AtomicInteger is implemented based on CAS). There is not much to explain here. Check the Code:

Class ReaderTask implements Runnable {Collector collector; List <byte []> recordList; CountDownLatch countDownLatch; AtomicInteger cursor; DBFReader reader; public ReaderTask (Collector collector, DBFReader dbfreader, list <byte []> recordList, AtomicInteger cursor, CountDownLatch countDownLatch) {this. collector = collector; this. reader = dbfreader; this. recordList = recordList; this. cursor = cursor; This. countDownLatch = countDownLatch;} @ Override public void run () {try {int length = recordList. size (); do {final int step = 16; // allocate 16 rows each time to this thread for processing. Int endIndex = cursor. addAndGet (step); int startIndex = endIndex-step; for (int I = startIndex; I <endIndex & I <length; I ++) {byte [] row = recordList. get (I); MarketRealtimeData SHData = new MarketRealtimeData (); SHData. setMarketType (Constants. MARKET_SH_STOCK); SHData. setIdNum (reader. get_string_efficiently (row, 0); SHData. setPrefix (reader. get_string_efficiently (row, 1); SHData. setPreClose Price (reader. get_long_efficiently_and_multiply_1000 (row, 2); SHData. setOpenPrice (reader. get_long_efficiently_and_multiply_1000 (row, 3); SHData. setTurnover (reader. get_long_efficiently_and_multiply_1000 (row, 4); SHData. setHighPrice (reader. get_long_efficiently_and_multiply_1000 (row, 5); SHData. setLowPrice (reader. get_long_efficiently_and_multiply_1000 (row, 6); SHData. setMatchPrice (reader. get_long _ Efficiently_and_multiply_1000 (row, 7); // read all fields. The following omitted rows: // .......... // ...... if (collector! = Null) {collector. collect (SHData) ;}}while (cursor. get () <length) ;}finally {if (countDownLatch! = Null) countDownLatch. countDown ();}}}

 

Private void readHangqingFile (String path, String name) throws Exception {// Long t1 = System. nanoTime (); DBFReader dbfreader_SH = null; try {dbfreader_SH = new DBFReader (new File (path + File. separator + name); List <byte []> list_sh = dbfreader_SH.recordsWithOutDel_efficiently (cacheManager); AtomicInteger cursor = new AtomicInteger (0); // atomic variable, used to allocate tasks between threads CountDownLatch countDownLatch = new CountDownL Atch (WORK_THREAD_COUNT); for (int I = 0; I <WORK_THREAD_COUNT-1; I ++) {// assign the task to multiple threads in the thread pool ReaderTask = new ReaderTask (collector, dbfreader_SH, list_sh, cursor, countDownLatch); globalExecutor.exe cute (task );} new ReaderTask (collector, dbfreader_SH, list_sh, cursor, countDownLatch ). run (); // The current thread also serves as the working thread countDownLatch. await (); // Long t2 = System. nanoTime (); // System. out. println ("speed time o N read and object: "+ (t2-t1);} finally {if (dbfreader_SH! = Null) dbfreader_SH.close ();}}

 

Tests show that the processing time is reduced from 15ms-20ms to 4 ms-7 ms when four threads are used for parallel processing.

 

Using all the optimization methods described in this article, the entire Read efficiency ranges from MS to Ms. We will discuss the total time consumption of converting 5000 fields from file update to file reading.

 

If we continue to deepen our work, we may have many details to improve. Tests show that the latency stability is not good enough, probably because of GC. We can also reduce object creation to reduce performance loss and GC; it also controls the GC execution time to optimize GC execution in idle time.

 

Binhua Liu Original article, reprint please indicate the original address http://www.cnblogs.com/Binhua-Liu/p/5616761.html

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.