In SQL Server, COUNT (*) or COUNT (1) or count ([column]) may be the most commonly used aggregate function. A lot of people actually have a clear distinction between the three. This article will describe the roles, relationships, and principles behind the three.
As always, I often see some so-called optimizations that don't use COUNT (*) instead of count (1), which can improve performance, given the reason that count (*) brings a full table scan. And actually how to write count is no different.
Count (1) and COUNT (*) actually mean that the evaluation of the expression in count () is NULL, if NULL is not counted, and not null. For example, we look at code 1 and specify NULL in count (the optimizer does not allow explicit null, so it needs to be assigned to a variable to specify).
DECLARE @xx INT
SET @xx =null
SELECT COUNT (@xx) from [AdventureWorks2012]. [Sales]. [SalesOrderHeader]
Code Listing 1. Null specified in Count
Since all rows are null, the result is not counted as 0, as shown in result 1.
Figure 1. Obviously, the result is 0.
So when you specify count (*) or COUNT (1) or count (' anything '), the result will be the same, because these values are not shown in null,2.
Figure 2. As long as a non-null expression is specified in count, the result is no different
What about the Count column?
For Count (column), the same applies to the above rule, evaluates whether the value of each row in the column is NULL, if NULL is not counted, not NULL is counted. Therefore count (column) calculates the count of columns or combinations of this column that are not empty.
How does the count (*) be executed specifically?
When you mentioned that count () has a non-null value, in SQL Server you only need to find the number of rows that are not NULL in the specific table-that is, all the rows (if the row value is all null, the line does not exist). The simplest way to do this is to find a column that is not NULL, and if the column has an index, the index is used, and of course, for performance, SQL Server chooses the narrowest index to reduce IO.
We are in the ADVENTUREWORKS2012 sample database [person]. Delete all nonclustered indexes on the [Address] table, and on the modifydate this data type is a datetime column, we look at the execution plan, as shown in 3:
Figure 3: An index using the CreateDate
We continue to build an index on the StateProvinceID column, which is an int column that accounts for 4 bytes and is shorter than the previous 8-byte datetime type, so SQL Server chooses the StateProvinceID index. As shown in 4.
Figure 4: A shorter StateProvinceID index is selected
Therefore, if the count (*) on a table is used for a long period of comparison, it can greatly improve performance by considering establishing a single-column index in the shortest column.
Differences Between SQL Select count (*) and COUNT (1) and how to perform and SQL performance optimizations