Optimization ideas for improving data import efficiency using java to improve efficiency
Implementation requirements written in front:
1. A total of 0.1 million phone numbers;
2. Duplicate and incorrect phone numbers;
3. Find the correct number (not repeated );
I. Implementation Method before optimization:
1. First, use regular expressions to filter 0.1 million entries of data and find out the error;
2. Use List. Contains to verify duplicate data. Use List. Add to Add non-duplicate data;
3. Finally, get the correct data from the List.
1 public class appMain {2 final static int _ capacity = 1000000; 3 final static Random rand = new Random (System. currentTimeMillis () + _ capacity); 4 static ArrayList <String> list = new ArrayList <String> (_ capacity ); 5 static ArrayList <String> newlist = new ArrayList <String> (_ capacity); 6 7 public static void main (String [] args) throws InterruptedException {8 long ts = System. currentTimeMillis (); 9 I Nt modVal = _ capacity/3; 10 for (int I = 0; I <_ capacity; I ++) {11 rand. setSeed (I); 12 list. add (Integer. toString (Math. abs (rand. nextInt () % modVal); 13} 14 ts = System. currentTimeMillis ()-ts; 15 System. out. println ("generation time:" + ts); 16 17 test1 (); 18} 19 20 static void test1 () {21 newlist. clear (); 22 int repetition = 0; 23 long ts = System. currentTimeMillis (); 24 for (String s: list) {25 if (! Newlist. contains (s) 26 newlist. add (s); 27 else {28 repetition ++; 29} 30} 31 ts = System. currentTimeMillis ()-ts; 32 System. out. println ("------ insert check method -------"); 33 System. out. println ("search time:" + ts); 34 System. out. println ("repeated:" + repetition); 35 System. out. println ("correct:" + newlist. size (); 36} 37}
Execution results before optimization:
/* Condition: capacity = 100000 result: generation time: 33 ------ insert check method ------- search time: 6612 repetition: 76871 correct: 23129 ------ sorting check method ------- search time: 91 repetition: 76871 correct: 23129 */
If the preceding method is used for import, the data volume will be suspended immediately after more than 5 million data records, so it is not advisable. Therefore, the following optimization is available.
II. Implementation Method After optimization:
1. Sort 0.1 million data first;
2. Compare the two data items (I will explain in detail why );
3. filter out the correct data.
1 public class appMain {2 final static int _ capacity = 1000000; 3 final static Random rand = new Random (System. currentTimeMillis () + _ capacity); 4 static ArrayList <String> list = new ArrayList <String> (_ capacity ); 5 static ArrayList <String> newlist = new ArrayList <String> (_ capacity); 6 7 public static void main (String [] args) throws InterruptedException {8 long ts = System. currentTimeMillis (); 9 int modVal = _ capacity/3; 10 for (int I = 0; I <_ capacity; I ++) {11 rand. setSeed (I); 12 list. add (Integer. toString (Math. abs (rand. nextInt () % modVal); 13} 14 ts = System. currentTimeMillis ()-ts; 15 System. out. println ("generation time:" + ts); 16 17 test2 (); 18} 19 20 static void test2 () {21 newlist. clear (); 22 int repetition = 0; 23 long ts = System. currentTimeMillis (); 24 25 Collections. sort (list); 26 String str = list. get (0); 27 int max = list. size (); 28 for (int I = 1; I <max; I ++) {29 if (str. equals (list. get (I) {30 repetition ++; 31 continue; 32} 33 newlist. add (str); 34 str = list. get (I); 35} 36 newlist. add (str); 37 38 ts = System. currentTimeMillis ()-ts; 39 System. out. println ("------ sorting check method -------"); 40 System. out. println ("search time:" + ts); 41 System. out. println ("repeated:" + repetition); 42 System. out. println ("correct:" + newlist. size (); 43} 44}
Result After optimization:
/* Condition: capacity = 1000000 result: generation time: 392 ------ insert check method ------- search time: 1033818 repetition: 703036 correct: 296964 ------ sorting check method ------- search time: 1367 repetition: 703036 correct: 296964 */
When the data volume reaches 0.1 million, the search time is nearly 90 times different. When the data volume reaches 1 million, the test data on my side is stuck in test1 (), while test2 () the results can still be fed back within dozens of seconds.
The following is a simple anatomy of the source code:
1 Collections.sort(list); 2 String str = list.get(0); 3 int max = list.size(); 4 for (int i = 1; i < max; i++) { 5 if (str.equals(list.get(i))) { 6 repetition++; 7 continue; 8 } 9 newlist.add(str);10 str = list.get(i);11 }
Line 1: sorting. the result after adding the list is [1, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5].
Line 2: initial str = 1;
Start from Line 4 to enter the loop:
Line 5: judge whether the str value is equal to the selector value (list. get (I) is a pointer for the moment). If the value is equal, skip the following steps to enter the next loop.
Line 9: Add str = 1 to the end of newlist
Line10: Assign the current selector value to str. str = 2 then enters the next loop.
...
I personally think this language is very troublesome. I still write some code to let the program tell you how to execute it.
1 public class appList {2 static ArrayList <String> list = new ArrayList <String> (); 3 static ArrayList <String> newlist = new ArrayList <String> (); 4 5 public static void main (String [] args) {6 for (int I = 1; I <5 + 1; I ++) {7 for (int j = 0; j <I; j ++) {8 list. add (Integer. toString (I); 9} 10} 11 System. out. println ("Initial list Value" + list. toString (); 12 // print output value [1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5] 13 14 String str = list. get (0); 15 int max = list. size (); 16 for (int I = 1; I <max; I ++) {17 Print (I); 18 if (str. equals (list. get (I) {19 PrintNew (); 20 continue; 21} 22 newlist. add (str); 23 System. out. println ("add \ t" + str); 24 str = list. get (I); 25 PrintNew (); 26} 27 28 newlist. add (str); 29 System. out. println ("add \ t" + str); 30 PrintNew (); 31 32 System. out. println ("newlist value" + newlist. toString (); 33 // print output value [1, 2, 3, 4, 5] 34} 35 36 static void PrintNew () {37 StringBuilder stringBuilder = new StringBuilder (); 38 stringBuilder. append ("newlist \ t"); 39 for (int I = 0; I <newlist. size (); I ++) {40 stringBuilder. append (newlist. get (I); 41 stringBuilder. append (","); 42} 43 System. out. println (stringBuilder. toString (); 44 System. out. println (); 45} 46 static void Print (int pos) {47 StringBuilder stringBuilder = new StringBuilder (); 48 stringBuilder. append ("list \ t"); 49 for (int I = 0; I <list. size (); I ++) {50 if (I = pos) {51 stringBuilder. append ("["); 52 stringBuilder. append (list. get (I); 53 stringBuilder. append ("],"); 54} else {55 stringBuilder. append (list. get (I); 56 stringBuilder. append (","); 57} 58} 59 System. out. println (stringBuilder. toString (); 60}
Execution result:
Initial Value of list [1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5] list 1, [2], 2, 3, 3, 4, 4, 4, 5, 5, 5, add 1 newlist 1, list 1, 2, [2], 3, 3, 4, 4, 4, 5, 5, 5, 5, 5, newlist 1, list, 2, [3], 4, 5, 5, add 2 newlist, list, [3, 4, 4, 4, 5, 5, 5, 5, newlist 1, 2, 2, 3, [3], 4, 4, 4, 5, 5, 5, 5, newlist 1, 2, 2, 3, 3, 3, [4], 4, 4, 4, 5, 5, 5, 5, add 3 newlist 1, 2, 3, 2, 3, 3, 4, [4], 4, 4, 5, 5, 5, 5, 5, newlist, 3, list, 4, [4], 5, 5, 5, newlist, 3, list, 3, 3, 4, 4, [4], 5, 5, 5, 5, newlist 1, 2, list 1, 2, 3, 3, 4, 4, 4, [5], 5, 5, 5, add 4 newlist 1, 2, 3, 3, 3, 3, 4, 4, 4, 5, [5], 5, 5, 5, newlist 1, 2, 2, 3, 3, 4, 4, 4, 5, [5], 5, 5, newlist 1, 2, 3, 3, 3, 4, 4, 5, 5, [5], 5, newlist 1, 2, 3, 3, 3, 4, 4, 5, 5, 5, [5], newlist 1, 2, 4, 3, 4, add 5 newlist 1, 2, 3, 4, 5, newlist value [1, 2, 3, 4, 5]