The statistics on the uniqueidentifier column are interesting, and there are some nasty behaviors on it. Let's see.
problem Recurrence (the repro)
To show you the behavior we just complained about, I created a database with the following simple table definitions, and I forced the primary key constraint on the uniqueidentifier column. This means that SQL Server generates a unique clustered index in the background, and the clustered index itself has a statistics object that describes the data distribution for that column. Of course, the data distribution is linear, because each value in the uniqueidentifier column is unique to itself.
1 --Create A new table with a uniqueidentifier column as primary key.2 --SQL Server would enforce the primary KEY constraint through unique clustered index in the background.3 CREATE TABLECustomerstableguid4 (5Iduniqueidentifier not NULL PRIMARY KEY,6FirstNameVARCHAR( -),7LastNameVARCHAR( -)8 )9 GO
Next i insert 1 million records into the table.
1 --Insert 1 million records2 DECLARE @i INT = 03 while(@i <= 1000000)4 BEGIN5 INSERT intoCustomerstableguid (ID, FirstName, LastName)6 VALUES7 (8 NEWID(),9 'FirstName' + CAST(@i as VARCHAR),Ten 'LastName' + CAST(@i as VARCHAR) One ) A - SET @i +=1 - END the GO
Now we use FULLSCAN to update our statistics on the table. FULLSCAN means that SQL Server scans the entire table to update the statistics object within the data.
1 -- Let's update the Statistics with a FULLSCAN. 2 UPDATE STATISTICS with FULLSCAN 3 GO
But when you look at the stats object now, you'll see that in the histogram, SQL Server only generates 4 steps.
1 ' dbo. Customerstableguid'23DBCC show_statistics ('dbo. Customerstableguid'pk__customer__3214ec271273c1cd')
In the header information you can see that during the statistics update, 1 million lines are sampled, but the histogram shows only 4 steps! But when you now update the statistics object with a smaller sampling interval, things will change:
1 -- Let's update the Statistics with a smaller sampling interval. 2 UPDATE STATISTICS with - PERCENT 3 GO
When you look at the histogram now, you will see that we have a lot of different steps:
Summary
Remember this when you use the uniqueidentifier column in your database design. As long as these values are unique, you will have a performance problem, because for the histogram, you have a huge range,avg_range_row can only make 1 rows of the correct estimate.
Thanks for your attention!
Statistics on the uniqueidentifier column