LAMDBA performance Due to the need for rapid calibration of large amounts of data in the work, the experiment uses read-in memory lists using LAMDBA lookups. Detailed requirements: actually read into the memory data 50W record the main set of data, also contains about 20 subsets, the subset of the maximum number of records 300W records. Read into the memory and verify the results within 5 minutes of completion. The test data reads into memory and consumes about 2-3g memory. The multi-threaded reading data is tested here, but the speed-up effect is not obvious. SQL Server has its own execution queueing mechanism (the read-in data process encounters an episode that reads slowly, takes up memory, and inadvertently discovers that the person's photo stream is also read into memory. There is no need for photo information to process data processing. Removed after the speed increase is very large, the memory is also reduced a lot, after the similar operation should be ruled out in advance to exclude such cases) data verification script by another colleague wrote, began to get the script dropped into the test, the results of half an hour did not respond. To end the process decisively. Then there is the painful optimization process, which once doubted that such a way would not work. It took almost two weeks to complete 5,000 main set messages within 10 seconds. 50W data is also completed in 3-5 minutes. Finally, 100 concurrent tests are completed. The check result returns normally. All OK is now available on line. The following are some of the areas that should be noted during this data validation implementation. 1, from the original database check to memory check, memory faster 2, load data using multi-line preempted 3, primary key use shaping to speed up the query speed this is particularly important, the speed is much higher. 4, using LAMDBA expression to find data use federated query instead of for Loop 5, according to the size of the data to take linear search or binary search to improve the query speed 6, the common data only once, in the entire checksum used globally. Concurrency tests found that static properties in static classes are not secure because static classes in memory have only one copy removed after the static multithreading test normal The following is the test data, and related instructions, can be directly ignored. Interested in can look. 1, 70,000 records A01.findall (x =!x.personstatus.in ("01", "02", "03", "04")), the total load of 15298 people, time: 0.019519 seconds. A01. FINDALL (x =!) ( X.personstatus = = "01" | | X.personstatus = = "02" | | X.personstatus = = "03" | | X.personstatus = = "04")) loop find, load 15298 people, time: 0.0284169 seconds. 2, 33,000 recordsX.codeid = = "ZB01" has 3,300 records codes.findall (x = X.codeid = = "ZB01" && (x.codeitemname = "Districts" | | x.codeitemname = = "County") to cycle through, load 287 people, time: 0.0139286 seconds. Codes.findall (x = X.codeid = = "ZB01" && (x.codeitemname.in ("District", "County"))) search, total load 287 people, time: 0.0230568 seconds . 3, 4,000 records codeids has 3,300 records personTableList.A01.FindAll (x =!x.a0114. In (Codeids)); A01 4,000 records loop find, load 0 people, time: 0.1066983 seconds. A01 70,000 records loop find, load 0 people, time: 1.7386399 seconds. foreach (Var A01 in persontablelist.a01) { if (!codes. Exists (x = X.codeitemid = = A01. A0114)) { persons. ADD (A01); } } The above form code, two lists are 7W records when the loop lookup, A total of 75601 people, time: 55.4800723 seconds. Loop lookup, load 75601 people, time: 107.4412256 seconds. 3, A01. FindAll (x = x.w0111g = = "") loop lookup, load 183 people, time: 0.0039961 seconds. A01. FindAll (x = x.w0111g. Issame ("")) loop lookup, total load of 183 people, time: 0.0307353 seconds. a01. FindAll (x = Personids2.indexof (X.personid)) Fastest A01.findall (x = x.personid.in (personids)) Second A01.findall ( x = Personids2.contains (X.personid)) second  A01. FindAll (x = personids2.exists (p=>p = = X.personid)) Slowest Federated query fast var query = (from A14 in PERSONDATA.A14 &NB sp; Join A01 in persondata.a01 on A14. PersonID equals A01. personid Select New {A14. PersonID, A14. A1407, A01. A0141}). ToList (); personids = query. FindAll (x = x.a0141 > x.a1407) Very important primary key field shaping fields are faster than strings linear lookups: Contains,find,indexof are linear lookups. Binary lookup: BinarySearch, because the binary lookup must be valid for an ordered array, so the sort method of the list is called before the lookup. Conclusion: If the number of list items is relatively small, linear lookup is faster than binary search, and the more the number of items is the more obvious the advantage of binary algorithm. According to the actual situation to choose the appropriate way to find. Test Data 2 time: 0.0186627 seconds. Two minutes time: 0.0356611 seconds .
LAMDBA performance test Big Data memory lookups