LINQ Performance Profiling

Source: Internet
Author: User
Tags what sql

Orcas (vs2008&framework3.5) brings us a lot of exciting new features, especially the introduction of LINQ, which can be said to be convenient for a large number of developers and framework designers. In the past, when we were using some of O/rmapping's frameworks, the most worrying thing was the performance of complex queries and database operations, so what did LINQ bring to us? Start with the LINQ query.

One: LINQ query:

In a project, most of the database operations should be query, especially complex projects, the query is often from many tables to fetch some data, "patchwork" to the user. If you don't use the O/R mapping, we're dealing with two different ways: a complex SQL statement or a stored procedure.

However, in the case of O/R mapping, in order to ensure that the extracted data is strongly typed, have to use a number of methods to piece together, trouble and do not say that the performance loss is positive, then there is the O/R mapping framework and the non-encapsulated ADO together with the embarrassing scenes. So what does a LINQ query do with it? First, let's look at some complex query syntax.

Design two tables, representing the department (Department) and the employee (employee), respectively:

1, Fuzzy query:

Fuzzy query is difficult to implement in the framework of o/rmapping, even if the implementation, the efficiency is not very high, LINQ provides our rich fuzzy query method, such as we want to look in the employee table surnamed Zhang, you can:

Dbdataclassesdatacontext dbdata = new Dbdataclassesdatacontext ();

var query = from employee in Dbdata. Employees whereemployee. Employeename.startswith ("Zhang")

Select Employee;

At the time of actual execution, this sentence was translated into:

    1. SELECT [T0]. [EmployeeId], [t0]. [Depid], [t0]. [EmployeeName],
    2. [T0]. [Employeesalary] from [dbo]. [Employee] As
    3. [T0] WHERE [T0]. [EmployeeName] Like @p0

Such an SQL statement, the inside of the @p0 this parameter in the execution of the time with "%", that is, @p0 the beginning of any qualifying records are taken out. This approach also has contains, endswith, and so on in LINQ queries.

2. Nested query:

If we need to take out all the employees in the marketing department, in the previous O/R mapping framework, it is often necessary to take two times (we do not consider the existence of a mapping relationship), one is to take out the marketing department number, and then use the number to remove all employees. LINQ queries provide us with a better solution, and we can do this:

    1. var query = from employee in Dbdata. Employees where employee. Depid = =
    2. (from department in Dbdata. Departments where department. Depname = = "Marketing Department"
    3. Select department). Single (). Depid
    4. Select employee;

Here we use nested queries to do this, which, when executed, is converted to a nested query of sql:

    1. SELECT [T0]. [EmployeeId], [t0]. [Depid], [t0]. [EmployeeName],
    2. [T0]. [Employeesalary] from [dbo]. [Employee] As [t0] WHERE [t0]. [Depid] =
    3. (SELECT [T1]. [Depid] from [dbo]. [Department] As
    4. [T1] WHERE [T1]. [Depname] = @p0)

Of course, we don't have to be so troublesome here, we just need to use an existing entity relationship and we can do it easily:

    1. var query = from employee in Dbdata. Employees where
    2. Employee. Department.depname = = "Marketing Department"
    3. Select employee;

However, this is fundamentally different from the query you just made, so let's take a look at what SQL statements the LINQ query translates into:

    1. SELECT [T0]. [EmployeeId], [t0]. [Depid], [t0]. [EmployeeName],
    2. [T0]. [Employeesalary] from [dbo]. [Employee] As
    3. [T0] Left OUTER JOIN [dbo]. [Department] As [T1] on [T1]. [Depid] =
    4. [T0]. [Depid] WHERE [T1]. [Depname] = @p0

Here, the conversion is not nested SQL statements, and is converted to a left connection query, you can see that the nested query is a special case, we can use a tool.

3. Projection

If the query is just a simple query, then the combination of anonymous classes to implement projection query, is a LINQ query provides a tool, this method is very flexible, but also to meet the vast majority of our query requirements. Let's look at an example: we need to find out the sum of the department's wages, how to do it?

    1. var query = from department in Dbdata. Departments
    2. Select new{depid = department. Depid, Depname=department. Depname,
    3. Depsalary =
    4. Department. Employees.sum (E = e.employeesalary)};

In this query, an anonymous class is generated after the New keyword, which has three properties, namely Depid,depname and Depsalary, where depsalary is computed and is converted to:

    1. SELECT [T0]. [Depid] As [Depid], [t0]. [Depname] As [Depname],
    2. (SELECT SUM ([t1].[ Employeesalary]) from [dbo]. [Employee] As [T1]
    3. WHERE [T1]. [Depid] =
    4. [T0]. [Depid]) As [Depsalay] from [dbo]. [Department] As [t0]

Good elegance of the code, it is too convenient.

4. Extension methods using LINQ queries

LINQ queries provide a number of extension methods that allow us to do a variety of queries, and we look at several typical extension methods:

A) Average, Max

    1. Decimal x = dbdata. Employees.average (e = e.employeesalary);
    2. Decimal y = dbdata. Employees.max (e = e.employeesalary);

This is calculated for the employee's average salary with the maximum wage, you don't need to write any SQL statements, Orcas provides a LINQ-to-SQL object-relational mapper that handles getting, tracking, and updating objects that map to your database data definitions and stored procedures.

You simply filter and shape the results using any LINQ query extension method, and LINQ-to-SQL executes the SQL code needed to get the data (note that the average and max extension methods above obviously do not return all rows of data from the data table, They use the TSQL aggregate function to calculate the values in the database, and then return only a single scalar value.

b) Where, by

Sometimes, we do a simple query and sort of a table, so we don't have to write lengthy LINQ query statements and use the LINQ extension method directly, such as:

    1. var query = Dbdata. Employees.where (E = e.employeesalary > 2000).
    2. (e = e.employeename);

The lambda syntax is used here , and the sentence is converted to the following SQL statement:

    1. SELECT [T0]. [EmployeeId], [t0]. [Depid], [t0]. [EmployeeName],
    2. [T0]. [Employeesalary] from [dbo]. [Employee] As [t0] WHERE
    3. [T0]. [Employeesalary] > @p0 ORDER by [t0]. [EmployeeName]

The above is done by leveraging the expression tree support provided by Lambda, as well as the IQueryable interface, and the code is clean and tidy.

C) Take, Skip

For large quantities of data processing, has been a developer of the more difficult things, Microsoft in. NET1.1 to 2.0 in the GridView and other controls, the processing of large quantities of data has been not ideal,LINQ query for the processing of large quantities of data , can be a good solution to this aspect of the problem.

    1. var query = Dbdata. Employees.skip (10). Take (10);

This sentence means to skip the table of 10 records, and then fetch 10, that is, take the 11th to 20th record, converted to SQL statement as follows:

    1. SELECT [T1]. [EmployeeId], [T1]. [Depid], [T1]. [EmployeeName],
    2. [T1]. [Employeesalary] From
    3. (SELECT row_number () over (ORDER by [t0].[ EmployeeId],
    4. [T0]. [Depid], [t0]. [EmployeeName], [t0]. [Employeesalary]) As
    5. [Row_number], [t0]. [EmployeeId],
    6. [T0]. [Depid], [t0]. [EmployeeName],
    7. [T0]. [Employeesalary] from [dbo]. [Employee] As [T0]) as
    8. [T1] WHERE [T1]. [Row_number] Between
    9. @p0 + 1 and @p0 + @p1 ORDER by [T1]. [Row_number]

From the above SQL statement, the most consumption performance paging operation is completely given to the database operation, its processing mechanism is no longer like the GridView control paging, is to take out all the data, and then pagination display, so the efficiency is much higher.

D) ToList and ToArray

By default, the data type of the query result is the IEnumerable type, and many developers are not accustomed to this type and prefer a collection or an array, so it's OK to use ToList or ToArray To convert the query results into collections or arrays .

What we need to know here is that when querying the results using query statements, there is actually no actual operation of the database, here is the deferred loading mechanism used, if you do not want to use lazy loading, but need to know the results of the query immediately, You can do this with tolist or toarray. This is a very useful mechanism.

For example, when we need to show the employees of two departments, the department can take out the list and then take out the employees of each department in turn, then the access is more efficient, because there is no need to access the database every time to fetch the department.

Second: Performance analysis of data manipulation

When we will be proficient in using the above query method of the database content to do a variety of queries, you should understand the mechanism of these database operations, timely adjustment of various data operation statements, with a high efficiency operation. So, let's take a look at LINQ database operations and see what she's doing.

In comparison with NHibernate, LINQ is indeed better than nhibernate in the performance and controllability of the O/R mapping, first, the LINQ default data map is implemented by attribute, which is. NET-specific syntax, at compile time has determined the various properties of the data object, and most o/rmapping tools such as NHibernate still use XML mapping file to describe the properties of the data object, from the external file to read the properties of the data object, obviously run time loss of efficiency.

Secondly, there are differences in the way data is obtained, and the powerful SQL Analysis mechanism in LINQ can analyze the SQL statements of various data operations and optimize them, and its efficiency is also obvious.

Of course, as an O/R mapping tool, its efficiency must not reach the efficiency of accessing the database directly using SQL statements, which is what we typically call Sqldatareader/sqldataadapter access to the database, but The performance of LINQ gave us a very big surprise.

I did a test, using SqlDataReader and LINQ to do the same large-scale data query, lag unexpectedly less than 10%, and nhibernate query efficiency, but a lot lower, almost 1 time times slower. For such a powerful data mapping function, this efficiency is acceptable to us. Unfortunately, LINQ queries can only support SQL Server support (but can support XML, entity, and so on) at this time.

On the data query, we can optimize the query by analyzing the SQL statements generated by the LINQ query, which is very convenient, but for the efficiency of the data update, we have to talk about the data update mechanism of LINQ, in general, the data update we will do this:

    1. var query = from emp in dbdata. Employees where EMP. depid== "1001" SELECT EMP;
    2. Employee employee = query. First ();
    3. Employee. EmployeeName = "John Doe";
    4. Dbdata. SubmitChanges ();

For the above code, we can see that the function is to remove all employees with department code 1001 from the Employee table, then we take out the first piece of data (here for the sake of simplicity, we just take out the first one, we can actually use where to get the record to satisfy the condition), and then change the name to "John Doe" , and then update to the database.

What does the LINQ query do with this code? A query removes several records from the database, puts them in memory, and marks them as new (unchanged), and when the employee's name is modified, the modified object is marked as dirty (changed).

in SubmitChanges, the SQL statement is automatically generated and executed for records with an in-memory object state of dirty , that is, we want to complete the update of the data at least one query and one update at a time.

Due to the use of delayed load (layze load) technology, in the above statement actually removed from the database only 1 records, update only this one, so the efficiency is still very high, I found in the process of testing, from 250,000 data randomly extracted a piece to update, The actual efficiency is almost no different from the random extraction of one of the 10 data, because the comparison update state is in memory, so the efficiency is higher. Let's take a look at what SQL statements the actual update generates:

    1. UPDATE [dbo]. [Employee] SET [EmployeeName] = @p4 WHERE
    2. ([EmployeeId] = @p0) and ([depid] = @p1) and
    3. ([EmployeeName] = @p2) and ([employeesalary] = @p3)

Originally, we only modified the EmployeeName field, and the generated SQL statement was only updated with the Employee field. So, let's look at the following conditions, why do we include other conditions besides the primary key? Originally, this is also the rigor of the LINQ query to automatically generate SQL statements, which is to prevent an error when multiple transactions are updated for the same record in the concurrency situation, and if a transaction updates the record, the B transaction update fails.

We can't help but ask, what if I want to update the primary key field? Is it wrong to update to more than one record? The answer is yes, it is sure to be incorrectly updated to other records, so LINQ specifies that the primary key field is not allowed to update, if it is true to update, then delete the record and reinsert the new record. Such a rigorous SQL statement will bring us some trouble, let's look at the following scenario:

If we have a field in the table for the counter, using the SQL statement is this:

    1. Update counttable set countcolumn=countcolumn+1 where [email protected]

However, the SQL statements generated using LINQ queries are:

    1. UPDATE [dbo]. [Counttable] SET [Countcolumn] = @p2 WHERE
    2. ([Countid] = @p0) and ([countcolumn] = @p1)

@p2 This parameter is passed in after calculation, @p1 This parameter is the original value of Countcolumn. That is, the value of countcolumn+1 is not calculated by the database, so that when the number of concurrency is high, we tend to fail the update. I did a test, using multi-threaded multi-user simulation to count statistics, the value of statistics in the database than the use of static variables to save the value is smaller, which means that the database update is a failure situation.

In addition, so each update, need to complete the operation has to find and update two steps, so for efficiency also has a relatively big impact.

LINQ Performance Profiling

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.