Since I used EF, I have been very concerned about whether there are potential performance problems every time. So every time I write a LINQ query, I will use SQL profiler to check the actually generated SQL statements to find potential performance problems. We strongly recommend that you do this to avoid problems with the software in the future.
1. Select only one or more columns
Sometimes, writing LINQ in C # Looks comfortable, but the performance is not necessarily good, so it is necessary to make some adjustments. For example:
I need to know the number of clicks in an article. I may write:
context.Post.FirstOrDefault(p => p.Id == postId).Hits;
Or:
context.Post.Find(postId).Hits;
I expect that they will only filter the data in the hits column in the database. However, through SQL profiler, we will find that these two statements actually gave all the columns to the SELECT statement, the access to hits is actually performed in the memory.
Although small tables cannot see performance problems, in case a column in your table stores file byte streams, such operations may be slow and consume additional network transmission, therefore, we cannot ignore this issue.
In fact, I only need to make a slight adjustment to avoid this problem, but the LINQ statement is ugly:
context.Post.Where(p => p.Id == postId).Select(p => p.Hits).FirstOrDefault();
The final native SQL generated by LINQ to SQL is as follows:
exec sp_executesql N‘SELECT TOP (1) [Extent1].[Hits] AS [Hits]FROM [dbo].[Post] AS [Extent1]WHERE [Extent1].[Id] = @p__linq__0‘,N‘@p__linq__0 uniqueidentifier‘,@p__linq__0=‘850C3A86-6C3D-408B-8099-61EDA559F804‘
Only one hits field is selected.
Ii. tolist () Problems
In fact, many EF performance problems are related to the query execution time. Our general intention is to first create a query expression, instead of execute, instead of build. Execution is performed only when the expression result is used.
When I was in the company code program, I saw many colleagues using EF. After I wrote the query, I liked to directly call the tolist () method. Sometimes this will cause a lot of performance problems. Because the declaration of a LINQ expression does not immediately execute the SQL query, however, once tolist () is added to the end, it will be executed immediately. If you only want to select some of the data based on the conditions, the tolist () will be filtered later, that is, it will be executed from the memory, it does not convert your condition into the SQL where statement for execution.
VaR query = from... // create a query, but do not execute... var result = query. tolist (); // execute the query immediately
Therefore, you should try to avoid looking for the desired element from the result of tolist.
Iii. iqueryable and ienumerable
I prefer iqueryable in the selection of these two interfaces. Most of the time, these two interfaces are consistent in usage, but if you want to perform an uncertain query, it means that the query expression is not one-time determined, for its results, it may be decided by other classes to select the items. In this case, iqueryable is used.
For example, I have a data layer method:
public IEnumerable<EdiBlog.Core.Entities.Post> GetAllPost(){ return context.Post;}
Obviously, it will be called by other methods in the system, and these callers expect different results. The common operation is to merge a where statement:
var myResult = postDa.GetAllPost().Where(...)
However, unfortunately, the conditions in the where statement are not converted to native SQL and are filtered in the memory. This is a relatively negative performance issue. So at the beginning of this article, I suggest you use SQL profiler to see how your own LINQ is executed.
If you change the return type to iqueryable, your where statement can be converted to SQL Execution.
public IQueryable<EdiBlog.Core.Entities.Post> GetAllPost(){ return context.Post;}
There is a good post on the two interfaces on stackoverflow at the diaosi Forum. You can take a look at them by yourself:
Http://stackoverflow.com/questions/252785/what-is-the-difference-between-iqueryablet-and-ienumerablet
"Ienumerable: ienumerable is best suitable for working with in-memory collection. ienumerable doesn't move between items, it is forward only collection.
Iqueryable: iqueryable best suits for remote data source, like a database or web service. iqueryable is a very powerful feature that enables a variety of interesting deferred execution scenarios (like paging and composition based queries )."
There is also an intuitive answer on the msdn Forum:
Iqueryable returns a "queryable" that is a query you cocould still be enriched before really sending it to the server.
Ienumerable returns a list that is the actual querying took place and you get the results. tolist is isued to force running the query and returning these enumerable results...
So in short:
-Use iqueryable if you want to return a base query that cocould be further enhanced before running it server side (by enumerating its items )..
-Use ienumerable/tolist if you want to return a list that has been retrieved from the DB
4. Number of computations, count () and count
This is the most vulnerable and a very serious performance issue. When we need to count the number of records that meet a certain condition, we want the SQL statement to be in the form of select count. However, the following seemingly natural statement will lead to undesirable results:
context.Category.FirstOrDefault(p => p.Name == categoryName).Posts.Count;
This is the statement used in my blog to count the number of articles under a certain category. Of course, this is not the case because of performance problems. The SQL statement is not select count, but divided into two. The following is what SQL profiler caught:
exec sp_executesql N‘SELECT TOP (1) [Extent1].[Id] AS [Id], [Extent1].[Name] AS [Name], [Extent1].[DisplayName] AS [DisplayName]FROM [dbo].[Category] AS [Extent1]WHERE [Extent1].[Name] = @p__linq__0‘,N‘@p__linq__0 nvarchar(4000)‘,@p__linq__0=N‘ASPNET‘exec sp_executesql N‘SELECT [Extent2].[Id] AS [Id], [Extent2].[Title] AS [Title], [Extent2].[Slug] AS [Slug], [Extent2].[PubDate] AS [PubDate], [Extent2].[PostContent] AS [PostContent], [Extent2].[Author] AS [Author], [Extent2].[CommentEnabled] AS [CommentEnabled], [Extent2].[IsPublished] AS [IsPublished], [Extent2].[Hits] AS [Hits], [Extent2].[Rators] AS [Rators], [Extent2].[Rating] AS [Rating], [Extent2].[ExposedToSiteMap] AS [ExposedToSiteMap], [Extent2].[DisplayFrom] AS [DisplayFrom], [Extent2].[DisplayTill] AS [DisplayTill], [Extent2].[LastModifyOn] AS [LastModifyOn], [Extent2].[PublishToRss] AS [PublishToRss]FROM [dbo].[PostCategory] AS [Extent1]INNER JOIN [dbo].[Post] AS [Extent2] ON [Extent1].[PostId] = [Extent2].[Id]WHERE [Extent1].[CategoryId] = @EntityKeyValue1‘,N‘@EntityKeyValue1 uniqueidentifier‘,@EntityKeyValue1=‘3FEB11A2-6E36-4DCE-8C02-614BEF7ACC62‘
We can see that EF has done two things. The first thing is to find the category whose name is "ASPnet", and then use the ID of this category to find all its posts, the last count is actually. net in the memory. This clearly gives select all the information we don't need. We only need one count. Is it so complicated for Mao?
Review the first article I mentioned. It is not difficult to find out. Attributes accessed after firstordefault (...) are all in the memory. Therefore, when we access category. firstordefault (P => P. Name = categoryname), the first SQL statement is generated. Followed by ". Posts" is the navigation attribute of the category object. EF will use lazy load to load all the posts of this category, so the second SQL statement is generated. Then the count following it is executed in the memory.
If you want the code to generate the LINQ to SQL statement as much as possible, there is a very simple principle, that is, try to use the LINQ and lambda expressions so that EF can help us translate. There are two types of count in C. Enumerable. Count () is a method, and list. Count is an attribute. Once a thing becomes a list, you can count it again, and it must be in the memory.
Therefore, in EF, to perform the Count operation, write as follows:
context.Post.Count(p => p.Categories.Any(q => q.Name == categoryName));
In this case, count () accepts a Lambda expression, so that the "select count" clause can be accurately translated into "select count:
SELECT [GroupBy1].[A1] AS [C1]FROM ( SELECT COUNT(1) AS [A1] FROM [dbo].[Post] AS [Extent1] WHERE EXISTS ( SELECT 1 AS [C1] FROM [dbo].[PostCategory] AS [Extent2] INNER JOIN [dbo].[Category] AS [Extent3] ON [Extent3].[Id] = [Extent2].[CategoryId] WHERE ([Extent1].[Id] = [Extent2].[PostId]) AND ([Extent3].[Name] = ‘ASPNET‘) ) ) AS [GroupBy1]
The performance is much better now ~
Some performance issues when using Entity Framework