Performance Test for querying millions of data and EF query performance test

Source: Internet
Author: User

Performance Test for querying millions of data and EF query performance test

I. Cause

I personally prefer EF. After all, I don't need to write SQL statements. It is highly efficient in development and easy to operate. However, I have always heard that EF has poor performance and has been tested by others, but I thought it was true. But I really don't know what it is. I only know what framework I chose is based on the actual situation. The blogger works in a startup company and chooses the EF framework. Just after a project is created, there will be less than tens of thousands of data, I feel that the performance is not that bad. So I want to test the data. Again, based on the principle of seeking truth and being pragmatic, It is tested for actual business needs, not for single-order performance. If you want to make an ERP system and consider the architecture of tens of millions of concurrent jobs, I will not say that. After all, it is not based on the framework of the actual project.

Ii. Statement

Based on actual projects, considering the addition, deletion, and modification operations on data for online projects by bloggers, the operation data is generally one, two, and more than a dozen, if you submit dozens of data records for addition, deletion, and modification at the same time, forgive the bloggers for not having seen them. What's more, you have never thought about submitting hundreds of data records for addition, deletion, and modification. However, I believe that EF is competent for the addition, deletion, modification, and modification operations under this order of magnitude. Therefore, this article will not test the addition, deletion, and modification capabilities of EF, because I feel that it can fully meet the needs of general projects. This article only tests the Single Table query function of EF, and then tests complex link queries.

Iii. Test conditions

 

    Common People's configuration, their own work computers.

SQL Server 2012, Entity Framework 6.1.3.

Iv. Test Data

  In view of the two or three fields in the previous tests, and the data is too simple, in order to avoid this impact, because the actual project may have many fields and the data volume is also complex, I simulated a relatively close data table and again tested it against realistic projects based on a realistic and pragmatic revolutionary policy.

    

100 million data records

    

5. Start testing

A WinForm test is conducted. The interface is as follows:

    

   1. Perform the Find test and generate a random id. When querying the code, run the code first.

    

1 private PortalContext db = new PortalContext (); 2 private int count = 0; 3 private TimeSpan ts = new TimeSpan (); 4 private void btnFind_Click (object sender, EventArgs e) 5 {6 7 count ++; 8 Random r = new Random (); 9 var id = r. next (0, 1000000); 10 txtId. text = id. toString (); 11 12 Stopwatch sw = new Stopwatch (); 13 sw. start (); 14 var user = db. users. find (id); 15 sw. stop (); 16 17 txtUserInfo. text = UserToString (user); 18 ts + = sw. elapsed; 19 string time = sw. elapsed + "(" + sw. elapsed. seconds + "s" + sw. elapsed. milliseconds + "ms)"; 20 txtDisplay. appendText ("Find query id (" + id + "):" + time + Environment. newLine); 21 txtData. text = "executed" + count + "Times, average time consumption" + new TimeSpan (ts. ticks/count); 22}

 

 

 

The result is as follows:

    

It can be seen that in the case of million data, there is no pressure to use Find to query based on the primary key id. For the first time, it takes some time to connect to the data.

2. Perform the Where test. The Code is as follows.

1 private void btnWhere_Click (object sender, EventArgs e) 2 {3 4 5 bool [] valids = new bool [] {false, true }; 6 string [] works = new [] {"programmer", "siege Lion", "Product Wang", "keyboard man", "code dog "}; 7 UserType [] userTypes = new [] {UserType. partner, UserType. common User, UserType. attorney}; 8 Random r = new Random (); 9 10 int num = r. next (0, 4680); 11 int num2 = r. next (0, 4680); 12 13 int max = Math. max (num, num2); 14 int min = Math. min (num, num2); 15 16 bool isValid = valids [num % 2]; 17 string work = works [num % 5]; 18 UserType type = userTypes [num % 2]; 19 20 txtIsValid. text = isValid. toString (); 21 txtWork. text = work; 22 txtUserType. text = type. toString (); 23 txtAmountMin. text = min. toString (); 24 txtAmountMax. text = max. toString (); 25 26 Stopwatch sw = new Stopwatch (); 27 sw. start (); 28 var query = db. users. where (u => true); 29 var queryWhere = query. where (u => u. userType = type & u. isValid = isValid & u. work = work & (u. amount> = min & u. amount <= max )). take (1000); 30 var list = queryWhere. toList (); 31 sw. stop (); 32 33 labelWhere. text = string. format ("where (u => u. userType = {0} & u. isValid =={ 1} & u. work = {2} u. amount >={ 3} & u. amount <= {4 }). take (1000) ", 34 type, isValid, work, min, max); 35 36 string time = sw. elapsed + "(" + sw. elapsed. seconds + "s" + sw. elapsed. milliseconds + "ms)"; 37 txtDisplay. appendText ("Where query" + list. count () + "data, time:" + time + Environment. newLine); 38 39}

 

 

 

Here we use Where to get the first 1000 pieces of data, which is basically impossible in the actual project, or all ToList (). Considering that in some cases, all ToList () is required () some data is generated, but it should be enough to get 1000 entries. In other cases, this test does not make much sense. We will wait to see the paging performance.

  

Attach some tests for listing:

  

Of course, it is impossible to play like this. Let's take a look at the memory, and more than pieces of data will be about 30 mb.

3. Any, First, Count test

The code is basically the same. Here only some picture references are attached.

  

  

  

All the above queries do not exist, but in comparison, Any and First have good performance in the case of existence, while count has good performance in the case of absence, and I do not know why. I feel that sometimes the Count query does not exist. After all, the average effect is good. PS: I used to read an article about how many times the Count is worse than Any. We recommend that you use Any if the query does not exist. Now, it seems similar.

 4. query by page.

  From the perspective of actual projects, when users look at the paging data, they usually look at the first 10 pages, and the data volume on each page is about 10-30, too much is unnecessary. The pageIndex and pageSize of all pages are set between the data. The page size may be pageIndex. If the pageSize is too large, it will also affect the performance. We will test it later.

  

About ms. In the past, it may have been too much time spent on sorting.

Attached to a relatively large pageIndex test result (the pageIndex is between and), it takes longer time when the page number is relatively large, not to mention pageSize, and certainly it will take longer.

  

  5. Contains Query

   The code is slightly changed here, and it does not matter. 

Private void btnContains_Click (object sender, EventArgs e) {string [] usernames = new [] {"zhao", "wang", "li", "san ", "zhaoliu"}; bool [] valids = new bool [] {false, true}; string [] works = new [] {"xuanyuan", "Siege ", "Product Wang", "pan Xia", "code "};.... // The full name is changed to some names, which can be a fuzzy query .. [Smile]}

 

 

 

    

It seems a little slow, around Ms. After all, Contains, after all, like. After all, million data is acceptable under some conditions. After all, it is convenient to make a self-used query.

  Vi. Increased data volume

Since it is millions, it cannot be only 1 million.

   1. 2 million of data

   

   

   Summary:

There is no pressure on Find, and there is no difference, probably because of the primary key index.

Any, First, and Count are still around ms and can be used.

The page size has reached 400 ms, and it is unacceptable. However, I have never seen thousands of pages. here we can use Where to filter out some old data or unwanted data before paging.

Contains has reached 1 s, which is unacceptable to users. However, at this level of data, the search engine should be used. This will not be taken into consideration.

   2. 3 million of data

  

  

  

  

Summary:

 

There is no pressure on Find, but there is no difference, probably because of the primary key index.

 

Any And First queries the results very quickly, and Count feels better here.

 

The page size is 500 ms. here we can use Where to filter out some old data or unwanted data before paging. You can see that the total number of records on the page is one, 2 million. Forget it and try to optimize it.

 

Contains is not mentioned.

 

4. 4 million of data

  

  

  

  

Summary:

There is no pressure on Find, but there is no difference, probably because of the primary key index.

Any And First queries will slow down, and Count feels better here.

No more paging.

Contains is not mentioned.

 VII. Conclusion

When I write it here, I feel that I am wrong. It seems that there is no half-cent relationship with EF. For such a simple query, it should not take any time for EF to generate SQL statements. At all, it does not play out EF's linq syntax or something, and generates various complex query statements and connection statements. Nana !!!

However, since all of them have reached this point, forget it as a test of SQL Server performance. In the case of million data, EF can still be used as needed.

It seems that EF is not fast, but it is related to the statements written by programmers. How to obtain data, how to query, and how to splice them? After all, SQL statements are generated to query at the end, therefore, the bottleneck is how to quickly generate efficient SQL statements.

For a startup company that has just started a project, the data is less than 100,000 of the data, so it is sure to use EF decisively. It is easy to get started and easy to develop. It is most important not to write SQL, after all, Microsoft has iterated so many versions, so it should be optimized.

   PS: I wrote a blog for the first time. I don't know if the testing posture is correct or the direction is correct. If there is something wrong, I 'd like to point it out. Please don't spray me. I will cry. [Haha], I am just a small code farmer who can only write add, delete, modify, and query information.

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.