Do not use unordered guids as both primary keys and clustered indexes.

Source: Internet
Author: User

Http://www.cnblogs.com/zhouruifu/archive/2012/04/18/2454088.html
I have always taken it for granted that using guid as the primary key is no big deal. Isn't it 12 more bits than Int? And now it's all in the SQL Server 2008,201 2 era. Isn't it a problem? What's more, many Microsoft projects also use guid as the primary key? Sharepoint, ASP. net SQL Server membership provider Default table, and so on. And there are many ......

Is that true? Until I read these two articlesArticleAfter guids as primary keys and/or the clustering key and that's not the point !!!, The conclusion surprised me, and even shocked me ".

To be exact, this bad result is not only about using guid as the primary key, but also about using it as the primary key.Clustered index.Because SQL Server is like this by default, we always accept the default. Note: here we aim at random unordered guids, such as the newid () of SQL Server and The GUID. newguid () generated by the client, such as. net (). If it is a sequential guid, such as generated by the SQL Server newsequentialid () method, there will be no clustered index problem (but the length problem still exists ). In fact, who uses sequential guids? I only know this method today.

The authors of the two articles concluded that unordered guid as the primary key and clustered index bring about the following problems:

    1. The waste of space and the resulting reduction in read/write efficiency.
    2. More importantly, fragmentation and the resulting read/write EfficiencySevereDecrease.

Therefore, do not use GUID (unordered or ordered) as the primary key,Do not use unordered guid for clustered Index.

I'm curious, do Microsoft developers make such low-level mistakes? I opened ASP. net SQL Server membership provider Default table view, found that although the primary key of these tables is guid, but clustered index is not the default primary key, such as aspnet_applications clustered index is loweredapplicationname field, the aspnet_users clustered index is a field connecting applicationid and loweredusername.

 

 

Guids as primary keys and/or the clustering keyby: Kimberly trippposted on: March 5, 2009

 

(Be sure to join our community to get our monthly newsletter with exclusive content, advance notice of classes with discount codes, and other SQL Server goodies !)

Expanding on the topic of "Are You Kidding Me "... One of the most prevalent problems I see today is the dreaded "guids as PKS" problem. However, just to be clear, it's not [As much of] Problem that your primary key is a guid as much as it is a problem that the primary key is probably your clustering key. they really are two things but the default behavior in SQL Server is that a primary key uses a uniqueClusteredIndex to enforce entity integrity. So, I thought I 'd take this post to really dive into why this is a problem and how you can hope to minimize it.

Relational concepts-what is a primary key?(Quick and basic reminder for what is what and why)

Starting at the very beginning... A primary key is used to enforce entity integrity. entity integrity is the very basic concept that every row is uniquely identifiable. this is especially important in a normalized database because you usually end up with tables and a need to reference rows processing SS those tables (I. e. relationships ). relational theory says that every table must have a primary key. SQL Server does not have this R Equirement. however, many features-like replication-often have a requirement on a primary key so that they can guarantee which row to modify on a related database/server (like the subscriber in a replication environment ). so, most people think to create one. however, not always...

What happens when a column (s) is defined as a primary key-in SQL Server?

The first thing that SQL Server checks is that all of the columns that make up the primary key constraint do not all nulls. this is a requirement of a primary key but not a requirement of a unique key. they also check to make sure (if the table has data) that the existing data meets the uniqueness requirement. if there are any duplicate rows, the addition of the constraint will fail. and, to check This as well as to enforce this for [future] new rows-SQL Server builds a unique index. more specifically, if you don't specify index type when adding the constraint, SQL Server makes the index a unique clustered index. so, why is that interesting...

What is a clustered index?

in SQL Server 7.0 and higher the internal dependencies on the clustering key changed. (Yes, it's important to know that things changed in 7.0... Why? Because there are still some folks out there that don't realize how radical of a change occurred in the internals (WRT to the clustering key) in SQL Server 7.0 ). it's always (in all releases of SQL Server) been true that the clustered index defines the order of the data in the table itself (yes, the data of the table becomes the leaf level of the clustered index) and, it's always been a [Potential] Source of fragmentation. that's really not new. although it does seem like it's more of a hot topic in recent releases but that may solely because there are more and more databases out there in general and they 've gotten bigger and bigger... And you feel the effects of fragmentation more when databases get really large.

What changed is that the clustering key gets used as the "lookup" value from the nonclustered indexes. prior to SQL Server 7.0, SQL server used a volatile rid structure. this was problematic because as records moved, all of the nonclustered indexes wocould need to get updated. imagine a page that "splits" where half of the records are relocated to a new page. if that page has 20 rows then 10 rows hav E new rids-that means that 10 rows in each (and all) of your nonclustered indexes wocould need to get updated. the more nonclustered indexes you had, the worse it got (this is also where the idea that nonclustered indexes are terribly expensive comes from ). in 7.0, the negative affects of record relocation were addressed in both clustered tables and heaps. in heaps they chose to use forwarding poi Nters. the idea is that the row's fixed RID is defined at insert and even if the data for the row has to relocate because the row no longer fits on the original page-the rows RID does not change. instead, SQL Server just uses a forwarding pointer to make one extra hop (never more) to get to the data. in a clustered table, SQL server uses the clustering key to lookup the data. as a result, this p UTS some strain on the clustering key that was never there before. it shoshould be narrow (otherwise it can make the nonclustered indexes unnecessarily wide ). the clustering key shoshould be unique (otherwise the nonclustered indexes wouldn't know "which" row to lookup-and, if the clustering key is not defined as unique then SQL server will internally Add a 4-byte uniquifier to each duplicate key valu E... This wastes time and space-both in the base table and the nonclustered indexes ). and, the clustering key shoshould be static (otherwise it will be costly to update because the clustering key is duplicated in all nonclustered indexes ).

In summary, the clustering key really has all of these purposes:

    1. It defines the lookup value used by the nonclustered indexes (shocould be unique, narrow and static)
    2. It defines the table's order (physically at creation and logically maintained through a linked list after that)-so we need to be careful of fragmentation
    3. It can be used to answer a query (either as a table scan-or, if the query wants a subset of data (a range query) and the clustering key supports that range, then yes, the clustering key can be used to reduce the cost of the scan (it can seek with a partial scan)

However, the first two are the two that I think aboutMostWhen I choose a clustering key. the third is just one that I * might * be able to leverage if my clustering key also happens to be good for that. so, some examples of good clustering keys are:

    • an identity column
    • A composite key of date and identity-in that order (date, identity)
    • A pseudo sequential GUID (using the newsequentialid () function in SQL Server or a "homegrown" function that builds sequential guids-like gert's "built originally to use in SQL 2000" xp_guid here: http://sqldev.net/xp/xpguid.htm

But, a guid that is not sequential-like one that has it's values generated in the client (using. net) or generated by the newid () function (in SQL Server) can beHorribly bad choice-PrimarilyBecause of the fragmentation that it creates in the base table but also because of its size. It'sUnnecessarilyWide (it's 4 times widerINT-based identity-which can give you 2 billion(Really, 4 billion) Unique rows). And, if you need more than 2 billion you can always go withBigint (8-byte INT) and get 263-1 rows. And, if you don't really think that 12 bytes wider (or 8 bytes wider) is a big deal-estimate how much this costs on a bigger table and one with a few indexes...

    • Base table with 1,000,000 rows (3.8 MB vs.. 15.26 MB)
    • 6 nonclustered indexes (22.89 MB vs. 91.55 MB)

So, we're looking at 25 MB vs 106 MB-and, just to be clear, this is just for 1 million rows and this is really just overhead. if you create an even wider clustering key (something horrible like lastname, firstname, middlieinitial-which let's say is 64 bytes then you're looking at 427.25 MB * Just * in overhead ..... And, then think about how bad that gets with 10 million rows and 6 nonclustered indexes-yes, you 'd be wasting over 4 GB with a key like that.

And, fragmentation costs you even more in wasted space and time because of splitting. paul's covered a lot about fragmentation on his blog so I'll skip that discussion for now but if your clustering key is prone to fragmentation then you need a solid maintenance plan-and this has it's own costs (and potential for downtime ).

So ............... Choosing a good clustering key early is very important!

Otherwise, the problems can start piling up!

KT

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.