Recently, I used C # As a comparison tool to compare the mobile phone numbers. The two files were de-duplicated and merged. It is found that hash tables are more efficient to search for data.
PreliminaryProgramAs follows:
String [] filemobile1 = file. readalltext (cb_file1.selecteditem.tostring (), encoding. getencoding ("gb2312 ")). split (New String [] {"\ r \ n"}, stringsplitoptions. removeemptyentries); string [] filemobile2 = file. readalltext (cb_file2.selecteditem.tostring (), encoding. getencoding ("gb2312 ")). split (New String [] {"\ r \ n"}, stringsplitoptions. removeemptyentries); dictionary <string, string> dic1 = new dictionar Y <string, string> (); string result = ""; int num = 0; stopwatch Sw = stopwatch. startnew (); foreach (string STR in filemobile1) {If (! Dic1.containskey (STR) {dic1.add (STR, ""); Result + = STR + "\ r \ n"; num ++ ;}} foreach (string STR in filemobile2) {If (! Dic1.containskey (STR) {dic1.add (STR, ""); Result + = STR + "\ r \ n"; num ++ ;}} file. writealltext (application. startuppath + "\ 11.txt", result, encoding. getencoding ("gb2312"); MessageBox. show ("comparison complete! "+ Num. tostring () +"; sharing "+ Sw. elapsedmilliseconds. tostring ());
Test data: File 1 contains 150000 numbers, file 2 contains 50000 numbers, and file 1 and file 2 have about 50000 duplicate numbers, the test results took about 150 seconds (too long ).
Later, it took about 50 milliseconds for my friends and colleagues to write with dictionary. After comparison, we found the culprit:
Result + = STR + "\ r \ n ";
As you may also understand, because the string in C # is fixed, this sentence will regenerate the new string and allocate new memory space.
Conclusion: do not modify the string type in a large loop.
See my buddyCode:
String [] filemobile1 = file. readalltext (cb_file1.selecteditem.tostring (), encoding. getencoding ("gb2312 ")). split (New String [] {"\ r \ n"}, stringsplitoptions. removeemptyentries); string [] filemobile2 = file. readalltext (cb_file2.selecteditem.tostring (), encoding. getencoding ("gb2312 ")). split (New String [] {"\ r \ n"}, stringsplitoptions. removeemptyentries); dictionary <string, string> dic1 = new dictionar Y <string, string> (); stringbuilder result = new stringbuilder (); int num = 0; stopwatch Sw = stopwatch. startnew (); foreach (string STR in filemobile1) {If (! Dic1.containskey (STR) {dic1.add (STR, ""); // result. append (STR + "\ r \ n"); num ++ ;}} foreach (string STR in filemobile2) {If (! Dic1.containskey (STR) {dic1.add (STR, ""); // result. append (STR + "\ r \ n"); num ++ ;}} string [] content = new string [dic1.keys. count]; dic1.keys. copyto (content, 0); string fileresult = string. join ("\ r \ n", content); file. writealltext (application. startuppath + "\ 11.txt", fileresult, encoding. getencoding ("gb2312"); MessageBox. show ("comparison complete! "+ Num. tostring () +"; sharing "+ Sw. elapsedmilliseconds. tostring ());
The same data test results only took 40 milliseconds.
PPS: If stringbuilder is used, it will be much more efficient than string, but it is 10 milliseconds slower than my buddy's method, because I think it is the reason for too many executions.
PPS: In a large loop, we should minimize the number of Operation statements to prevent time-consuming operations, such as memory allocation. Do not implement it step by step in the loop.
PPS: Buddy ID Ma Pengfei