Optimization ideas for improving data import efficiency using java to improve efficiency

Source: Internet
Author: User

Optimization ideas for improving data import efficiency using java to improve efficiency

Implementation requirements written in front:

1. A total of 0.1 million phone numbers;

2. Duplicate and incorrect phone numbers;

3. Find the correct number (not repeated );

 

I. Implementation Method before optimization:

1. First, use regular expressions to filter 0.1 million entries of data and find out the error;

2. Use List. Contains to verify duplicate data. Use List. Add to Add non-duplicate data;

3. Finally, get the correct data from the List.

1 public class appMain {2 final static int _ capacity = 1000000; 3 final static Random rand = new Random (System. currentTimeMillis () + _ capacity); 4 static ArrayList <String> list = new ArrayList <String> (_ capacity ); 5 static ArrayList <String> newlist = new ArrayList <String> (_ capacity); 6 7 public static void main (String [] args) throws InterruptedException {8 long ts = System. currentTimeMillis (); 9 I Nt modVal = _ capacity/3; 10 for (int I = 0; I <_ capacity; I ++) {11 rand. setSeed (I); 12 list. add (Integer. toString (Math. abs (rand. nextInt () % modVal); 13} 14 ts = System. currentTimeMillis ()-ts; 15 System. out. println ("generation time:" + ts); 16 17 test1 (); 18} 19 20 static void test1 () {21 newlist. clear (); 22 int repetition = 0; 23 long ts = System. currentTimeMillis (); 24 for (String s: list) {25 if (! Newlist. contains (s) 26 newlist. add (s); 27 else {28 repetition ++; 29} 30} 31 ts = System. currentTimeMillis ()-ts; 32 System. out. println ("------ insert check method -------"); 33 System. out. println ("search time:" + ts); 34 System. out. println ("repeated:" + repetition); 35 System. out. println ("correct:" + newlist. size (); 36} 37}

Execution results before optimization:

/* Condition: capacity = 100000 result: generation time: 33 ------ insert check method ------- search time: 6612 repetition: 76871 correct: 23129 ------ sorting check method ------- search time: 91 repetition: 76871 correct: 23129 */

If the preceding method is used for import, the data volume will be suspended immediately after more than 5 million data records, so it is not advisable. Therefore, the following optimization is available.

 

II. Implementation Method After optimization:

1. Sort 0.1 million data first;

2. Compare the two data items (I will explain in detail why );

3. filter out the correct data.

1 public class appMain {2 final static int _ capacity = 1000000; 3 final static Random rand = new Random (System. currentTimeMillis () + _ capacity); 4 static ArrayList <String> list = new ArrayList <String> (_ capacity ); 5 static ArrayList <String> newlist = new ArrayList <String> (_ capacity); 6 7 public static void main (String [] args) throws InterruptedException {8 long ts = System. currentTimeMillis (); 9 int modVal = _ capacity/3; 10 for (int I = 0; I <_ capacity; I ++) {11 rand. setSeed (I); 12 list. add (Integer. toString (Math. abs (rand. nextInt () % modVal); 13} 14 ts = System. currentTimeMillis ()-ts; 15 System. out. println ("generation time:" + ts); 16 17 test2 (); 18} 19 20 static void test2 () {21 newlist. clear (); 22 int repetition = 0; 23 long ts = System. currentTimeMillis (); 24 25 Collections. sort (list); 26 String str = list. get (0); 27 int max = list. size (); 28 for (int I = 1; I <max; I ++) {29 if (str. equals (list. get (I) {30 repetition ++; 31 continue; 32} 33 newlist. add (str); 34 str = list. get (I); 35} 36 newlist. add (str); 37 38 ts = System. currentTimeMillis ()-ts; 39 System. out. println ("------ sorting check method -------"); 40 System. out. println ("search time:" + ts); 41 System. out. println ("repeated:" + repetition); 42 System. out. println ("correct:" + newlist. size (); 43} 44}

Result After optimization:

/* Condition: capacity = 1000000 result: generation time: 392 ------ insert check method ------- search time: 1033818 repetition: 703036 correct: 296964 ------ sorting check method ------- search time: 1367 repetition: 703036 correct: 296964 */

When the data volume reaches 0.1 million, the search time is nearly 90 times different. When the data volume reaches 1 million, the test data on my side is stuck in test1 (), while test2 () the results can still be fed back within dozens of seconds.

The following is a simple anatomy of the source code:

 1 Collections.sort(list); 2 String str = list.get(0); 3 int max = list.size(); 4 for (int i = 1; i < max; i++) { 5     if (str.equals(list.get(i))) { 6         repetition++; 7         continue; 8     } 9      newlist.add(str);10      str = list.get(i);11 }

Line 1: sorting. the result after adding the list is [1, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5].

Line 2: initial str = 1;

Start from Line 4 to enter the loop:

Line 5: judge whether the str value is equal to the selector value (list. get (I) is a pointer for the moment). If the value is equal, skip the following steps to enter the next loop.

Line 9: Add str = 1 to the end of newlist

Line10: Assign the current selector value to str. str = 2 then enters the next loop.

...

I personally think this language is very troublesome. I still write some code to let the program tell you how to execute it.

1 public class appList {2 static ArrayList <String> list = new ArrayList <String> (); 3 static ArrayList <String> newlist = new ArrayList <String> (); 4 5 public static void main (String [] args) {6 for (int I = 1; I <5 + 1; I ++) {7 for (int j = 0; j <I; j ++) {8 list. add (Integer. toString (I); 9} 10} 11 System. out. println ("Initial list Value" + list. toString (); 12 // print output value [1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5] 13 14 String str = list. get (0); 15 int max = list. size (); 16 for (int I = 1; I <max; I ++) {17 Print (I); 18 if (str. equals (list. get (I) {19 PrintNew (); 20 continue; 21} 22 newlist. add (str); 23 System. out. println ("add \ t" + str); 24 str = list. get (I); 25 PrintNew (); 26} 27 28 newlist. add (str); 29 System. out. println ("add \ t" + str); 30 PrintNew (); 31 32 System. out. println ("newlist value" + newlist. toString (); 33 // print output value [1, 2, 3, 4, 5] 34} 35 36 static void PrintNew () {37 StringBuilder stringBuilder = new StringBuilder (); 38 stringBuilder. append ("newlist \ t"); 39 for (int I = 0; I <newlist. size (); I ++) {40 stringBuilder. append (newlist. get (I); 41 stringBuilder. append (","); 42} 43 System. out. println (stringBuilder. toString (); 44 System. out. println (); 45} 46 static void Print (int pos) {47 StringBuilder stringBuilder = new StringBuilder (); 48 stringBuilder. append ("list \ t"); 49 for (int I = 0; I <list. size (); I ++) {50 if (I = pos) {51 stringBuilder. append ("["); 52 stringBuilder. append (list. get (I); 53 stringBuilder. append ("],"); 54} else {55 stringBuilder. append (list. get (I); 56 stringBuilder. append (","); 57} 58} 59 System. out. println (stringBuilder. toString (); 60}

Execution result:

Initial Value of list [1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5] list 1, [2], 2, 3, 3, 4, 4, 4, 5, 5, 5, add 1 newlist 1, list 1, 2, [2], 3, 3, 4, 4, 4, 5, 5, 5, 5, 5, newlist 1, list, 2, [3], 4, 5, 5, add 2 newlist, list, [3, 4, 4, 4, 5, 5, 5, 5, newlist 1, 2, 2, 3, [3], 4, 4, 4, 5, 5, 5, 5, newlist 1, 2, 2, 3, 3, 3, [4], 4, 4, 4, 5, 5, 5, 5, add 3 newlist 1, 2, 3, 2, 3, 3, 4, [4], 4, 4, 5, 5, 5, 5, 5, newlist, 3, list, 4, [4], 5, 5, 5, newlist, 3, list, 3, 3, 4, 4, [4], 5, 5, 5, 5, newlist 1, 2, list 1, 2, 3, 3, 4, 4, 4, [5], 5, 5, 5, add 4 newlist 1, 2, 3, 3, 3, 3, 4, 4, 4, 5, [5], 5, 5, 5, newlist 1, 2, 2, 3, 3, 4, 4, 4, 5, [5], 5, 5, newlist 1, 2, 3, 3, 3, 4, 4, 5, 5, [5], 5, newlist 1, 2, 3, 3, 3, 4, 4, 5, 5, 5, [5], newlist 1, 2, 4, 3, 4, add 5 newlist 1, 2, 3, 4, 5, newlist value [1, 2, 3, 4, 5]

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.