LAMDBA Performance testing Big Data memory lookups

Source: Internet
Author: User

  Due to the need for rapid calibration of large amounts of data in the work, the experiment adopts read-in memory list entity using LAMDBA lookup to achieve. Actual requirements: actually read into the memory data 50W record the main set of data, also contains about 20 subsets, the subset of the maximum number of records 300W records. Read into the memory and verify the results within 5 minutes of completion. The test data reads into memory and consumes about 2-3g memory. The multi-threaded reading data is tested here, but the speed-up effect is not obvious. SQL Server has its own execution queueing mechanism (the read-in data process encounters an episode that reads slowly, takes up memory, and inadvertently discovers that the recorded photo stream is also read into memory. The actual process of data processing does not require photo information. After removing the speed increase is very large, the memory is also reduced a lot, after the similar operation should be ruled out in advance to exclude such cases) the data check script is written by another colleague, there are about 500 checks, entity field legitimacy check, and the Master Set Association test. began to get the script thrown into the test, the results of half an hour did not respond. To end the process decisively. Then there is the painful optimization process, which once doubted that such a way would not work. It took almost two weeks to complete 5,000 main set messages within 10 seconds. 50W data is also completed in 3-5 minutes. Finally, 100 concurrent tests are completed. The check result returns normally. All OK is now available on line.   The following are some of the areas that should be noted during this data validation implementation. 1, from the original database check to memory check, memory speed faster, database calibration will also bring concurrent waiting, deadlock and other issues. 2, load data can use multi-line preempted 3, the primary key use shaping to speed up the query speed this is particularly important, the speed of a few thousand times. 4, using LAMDBA expression to find data   use federated query instead of for Loop 5, according to the size of the data to take linear search or binary search to improve the query speed 6, the common data only once, in the entire checksum used globally. Concurrency tests found that static properties in static classes are not secure because static classes in memory have only one copy removed after the static multithreading test normal   The following is the test data, and related instructions, can be directly ignored. Interested in can look. 1, 70,000 records A01.findall (x =!x.personstatus.in ("01", "02", "03", "04")), the total load of 15298 people, time: 0.019519 seconds. A01. FINDALL (x =!) ( X.personstatus = = "01" | | X.personstatus = = "02" | | X.personstatus = = "03" | | X.personstatus = = "04")) Loop lookup, co-loading15298 people, time: 0.0284169 seconds.  2, 33,000 records X.codeid = = "ZB01" has 3,300 records codes.findall (x = X.codeid = "ZB01" && (x. Codeitemname = = "Districts" | | X.codeitemname = = "County")) loop lookup, load 287 people, time: 0.0139286 seconds. Codes.findall (x = X.codeid = = "ZB01" && (x.codeitemname.in ("District", "County"))) search, total load 287 people, time: 0.0230568 seconds .  3, 4,000 records  codeids has 3,300 records personTableList.A01.FindAll (x =!x.a0114. In (Codeids)); A01 4,000 records   loop find, load 0 people, time: 0.1066983 seconds. A01 70,000 records   loop find, load 0 people, time: 1.7386399 seconds. foreach (Var A01 in persontablelist.a01)              {                 if (!codes. Exists (x = X.codeitemid = = A01. A0114))                 {                     persons. ADD (A01);                }            }  The above form code, two lists are 7W records when the loop lookup, a total load of 75601 people, time: 55.4800723 seconds. Round-robin search, load 75601 people, time: 107.4412256 seconds.  3, A01. FindAll (x = x.w0111g = = "") loop lookup, load 183 people, time: 0.0039961 seconds. A01. FindAll (x = x.w0111g. Issame ("")) loop lookup, total load of 183 people, time: 0.0307353 seconds.   a01. FindAll (x = ids2. IndexOf (X.personid)   fastest A01.findall (x = x.personid.in (personids)) Second a01.findall (x = ids2. Contains (X.personid))   second  A01. FindAll (x = ids2. Exists (p=>p = = X.personid)) Slowest   Federated query fast var query = (from A14 in datalist.a14                          Join A01 in datalist.a01                            on a14.id equals a01.id                           Select New {a14.id, A14. A1407, A01. A0141}). ToList ();            personids = query. FindAll (x = x.a0141 > x.a1407) Very important primary key field shaping fields are faster than strings    linear lookups: Contains,find,indexof are linear lookups. Binary lookup: BinarySearch, because the binary lookup must be valid for an ordered array, so the sort method of the list is called before the lookup. Conclusion: If the number of list items is relatively small, linear lookup is faster than binary search, and the more the number of items is the more obvious the advantage of binary algorithm. According to the actual situation to choose the appropriate way to find. Test Data 2 time: 0.0186627 seconds. Two minutes time: 0.0356611 seconds .  

LAMDBA performance test Big Data memory lookups

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.