Is null and is not null in SQL SERVER cause index failure ?, Sqlnull
In fact, there is nothing to say about this problem. I encountered an SQL statement during optimization today. Because it is interesting, I have cut and simplified the SQL statement and demonstrated it for you, as shown below:
declare @bamboo_Code varchar(3);
set @bamboo_Code='-01';
SELECT DISTINCT yarn_lot
FROM dbo.rsjob WITH ( nolock )
WHERE RIGHT(ges_no, 3) = @bamboo_Code
AND Isnull(yarn_lot, '') <> '';
As shown above, Isnull (yarn_lot, '') <>'' is used for the column yarn_log in SQL, I guess the person who wrote this SQL statement should be convinced that "is null and is not null will cause index failure" is a widely-spread rule on the Internet. Where can this suggestion be spread, cannot be verified. In this case, we verify whether is null or is not null will cause index failure.
Table rsjob is a heap table with an index yarn_lot on the column yarn_lot. Let's verify it through experiment.
SELECT DISTINCT yarn_lot
FROM dbo.rsjob WITH(nolock)
WHERE yarn_lot IS NOT NULL;
SELECT DISTINCT yarn_lot
FROM dbo.rsjob WITH(nolock)
WHERE yarn_lot IS NULL
As shown in the preceding figure, no matter whether it is null or is not null, an index IS used for search.
declare @bamboo_Code varchar(3);
set @bamboo_Code='-01';
SELECT DISTINCT yarn_lot
FROM dbo.rsjob WITH ( nolock )
WHERE RIGHT(ges_no, 3) = @bamboo_Code
AND Isnull(yarn_lot, '') <> '';
SELECT DISTINCT yarn_lot
FROM dbo.rsjob WITH ( nolock )
WHERE RIGHT(ges_no, 3) = @bamboo_Code
AND yarn_lot IS NOT NULL;
In addition, let's take a look at the overhead ratio of the two original SQL Execution plans as 52: 48. That IS to say, the performance of using is not null is better. Because the first SQL statement IS converted, as a result, it performs an index scan, while is not null performs an index search.
"Is null and is not null will lead to index failure. So those who believe in this dogma should verify it by themselves.
Next, we can verify through experiments that the actual environment may be complicated. We can build the following scenarios. In fact, the real environment is more complicated. However, there are basically the following scenarios:
Case 1: Separate index columns on the heap table predicates
USE Test;
GO
DROP TABLE TEST;
GO
CREATE TABLE TEST (OBJECT_ID INT, NAME VARCHAR(12));
CREATE INDEX PK_TEST ON TEST(OBJECT_ID) INCLUDE(NAME);
DECLARE @Index INT =0;
WHILE @Index < 10000
BEGIN
INSERT INTO TEST
SELECT @Index, 'kerry'+ CAST(@Index AS VARCHAR);
SET @Index = @Index +1;
END
INSERT INTO TEST
SELECT NULL, 'only test1' UNION ALL
SELECT NULL, 'only test2'
UPDATE STATISTICS TEST WITH FULLSCAN;
SELECT * FROM TEST WHERE OBJECT_ID IS NULL;
SELECT * FROM TEST WHERE OBJECT_ID IS NOT NULL;
Delete the index and create the following index. As shown below
Drop index PK_TEST on test;
Create index PK_TEST on test (OBJECT_ID)
It can be seen that the execution plan of is null or is not null is related to the index and data distribution.
Case 2: No index on the predicate of the heap table
USE Test;
GO
DROP TABLE TEST;
GO
CREATE TABLE TEST (OBJECT_ID INT, NAME VARCHAR(12));
DECLARE @Index INT =0;
WHILE @Index < 10000
BEGIN
INSERT INTO TEST
SELECT @Index, 'kerry'+ CAST(@Index AS VARCHAR);
SET @Index = @Index +1;
END
INSERT INTO TEST
SELECT NULL, 'only test1' UNION ALL
SELECT NULL, 'only test2'
UPDATE STATISTICS TEST WITH FULLSCAN;
SELECT * FROM TEST WHERE OBJECT_ID IS NULL;
SELECT * FROM TEST WHERE OBJECT_ID IS NOT NULL;
As shown above, if a heap table does NOT have any indexes, the use of is null or is not null must undergo a full table scan, but this is not covered in our discussion. Then let's take a look at the index built on other fields (mainly to compare it with the clustered index table), which still scans the entire table.
CREATE INDEX PK_TEST ON TEST(OBJECT_ID) INCLUDE(NAME);
INSERT INTO TEST
SELECT 10000, NULL UNION ALL
SELECT 10001, NULL ;
SELECT * FROM TEST WHERE NAME IS NULL;
SELECT * FROM TEST WHERE NAME IS NOT NULL;
Case 3: Composite Index columns in the heap table
USE Test;
GO
DROP TABLE TEST;
GO
CREATE TABLE TEST (OBJECT_ID INT, NAME VARCHAR(12), AGE INT);
CREATE INDEX IDX_TEST_N1 ON TEST(NAME, AGE);
DECLARE @Index INT =0;
WHILE @Index < 10000
BEGIN
INSERT INTO TEST
SELECT @Index, 'kerry'+ CAST(@Index AS VARCHAR), floor(rand()*100) ;
SET @Index = @Index +1;
END
INSERT INTO TEST
SELECT NULL, 'only test1', 12 UNION ALL
SELECT NULL, 'only test2',24
UPDATE STATISTICS TEST WITH FULLSCAN;
SELECT * FROM TEST WHERE NAME IS NULL;
SELECT * FROM TEST WHERE NAME IS NOT NULL;
If the predicate in the Union index is at the second or later position of the Union Index, what is the situation? We can see from the following that SQL is scanned in the full table.
DROP INDEX IDX_TEST_N1 ON TEST;
CREATE INDEX IDX_TEST_N1 ON TEST( AGE,NAME);
UPDATE STATISTICS TEST WITH FULLSCAN;
4. Separate index columns in the clustered index table
USE Test;
GO
DROP TABLE TEST;
GO
CREATE TABLE TEST (OBJECT_ID INT, NAME VARCHAR(12));
CREATE CLUSTERED INDEX PK_TEST ON TEST(OBJECT_ID)
DECLARE @Index INT =0;
WHILE @Index < 10000
BEGIN
INSERT INTO TEST
SELECT @Index, 'kerry'+ CAST(@Index AS VARCHAR);
SET @Index = @Index +1;
END
INSERT INTO TEST
SELECT NULL, 'only test1' UNION ALL
SELECT NULL, 'only test2'
SELECT * FROM TEST WHERE OBJECT_ID IS NULL;
SELECT * FROM TEST WHERE OBJECT_ID IS NOT NULL;
If I use IS NULL or IS NOT NULL for query on the column NAME, you will find that the execution plan IS changed from clustered index search to clustered index scan.
INSERT INTO TEST
SELECT 10000, NULL UNION ALL
SELECT 10001, NULL ;
SELECT * FROM TEST WHERE NAME IS NULL;
SELECT * FROM TEST WHERE NAME IS NOT NULL;
4. Composite Index columns in the clustered index table
USE Test;
GO
DROP TABLE TEST;
GO
CREATE TABLE TEST (OBJECT_ID INT, NAME VARCHAR(12), AGE INT);
CREATE CLUSTERED INDEX PK_TEST ON TEST(OBJECT_ID)
DECLARE @Index INT =0;
WHILE @Index < 10000
BEGIN
INSERT INTO TEST
SELECT @Index, 'kerry'+ CAST(@Index AS VARCHAR), floor(rand()*100) ;
SET @Index = @Index +1;
END
INSERT INTO TEST
SELECT 10001, 'NULL', 12 UNION ALL
SELECT 10002, 'NULL',24
CREATE INDEX IDX_TEST_N2 ON TEST(NAME,AGE);
UPDATE STATISTICS TEST WITH FULLSCAN;
If the predicate in the Union index is not in the first column, will it be indexed if it is null or is not null?
DROP INDEX IDX_TEST_N2 ON TEST;
CREATE INDEX IDX_TEST_N2 ON TEST(AGE,NAME);
UPDATE STATISTICS TEST WITH FULLSCAN;
As shown above, it changes from index search to index scan.
Conclusion: 1: "is null and is not null will lead to index failure." This dogma is completely shit. The SQL Server Index contains the null value, oracle indexes do not contain null values. Different databases are different.
2: if there is an index on the predicate, the index will basically be taken, so there is a certain relationship between index search or index scan and the index type, it is also related to the position where the field is located in the Union index. In addition, the wide skew of data distribution will also lead to full table scanning without indexing, but this does NOT mean that is null and is not null cause index failure. It is very important to infer the execution plan by observing the SQL statement. You need to comprehensively examine the index, data distribution, and statistical information of the table involved in the SQL statement to make a comprehensive judgment, in plain words, it should be combined with specific scenarios.