Test Using guid as the primary key of the database

Last Update:2018-12-06 Source: Internet

Author: User

Tags dell laptop

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Today, I listened to the msdn webcast lecture about Data Access to entlib. At the end of the lecture, I asked two questions that I was concerned about:

In a large application, if you need more than two sets of databases (such as SQL Server and Oracle), Can You encapsulate all the required SQL queries in the stored procedure, in this way, you only need a set of accessCodeIs there a better way to solve this problem?
It is easy to directly use guid as the primary key when setting the primary key of a database (multiple databases are supported at the same time). But there are not many factors that affect the performance during query. Is there any better solution?

The above two questions are briefly answered by Microsoft engineers due to the time. The first question should be taken into consideration for specific applications. However, the second question must have a certain performance impact, but the impact is not big. With this problem, I did this small experiment.

Note: If you have better suggestions, we will discuss them!

Test environment:

Dell laptop 1.5 GB
Win XP Professional
512 MB DDR RAM
SQL Server 2000 Personal Edition

Test method:

Create a database [test_guid] with 10 fields. Use guid as the primary key and other common field types to simulate the actual usage. The SQL code for table creation is as follows:

Create Table [DBO]. [test_guid] (
[guid] [varchar] (50) Collate chinese_prc_ci_as not null,
[test1] [varchar] (50) Collate chinese_prc_ci_as null,
[Test2] [datetime] Null,
[test3] [varchar] (50) Collate chinese_prc_ci_as null
[test4] [varchar] (100) collate partition null,
[test5] [varchar] (100) Collate chinese_prc_ci_as null,
[test6] [varchar] (50) Collate chinese_prc_ci_as null,
[test7] [text] collate chinese_prc_ci_as null,
[test8] [int] Null,
[test9] [int] Null
) on [primary] textimage_on [primary]
go

alter table [DBO]. [test_guid] With nocheck add
constraint [pk_test_guid] primary key clustered
(
[guid]
) on [primary]
go

Create a database [test_iidd] with 10 fields, use iidd as the primary key, and other common field types to simulate the actual usage. The SQL code for table creation is as follows:

Create Table [DBO]. [test_iidd] (
[iidd] [numeric] (9) Identity (1, 1) not null,
[test1] [varchar] (50) Collate chinese_prc_ci_as null,
[Test2] [datetime] Null,
[test3] [varchar] (50) Collate chinese_prc_ci_as null
[test4] [varchar] (100) collate partition null,
[test5] [varchar] (100) Collate chinese_prc_ci_as null,
[test6] [varchar] (50) Collate chinese_prc_ci_as null,
[test7] [text] collate chinese_prc_ci_as null,
[test8] [int] Null,
[test9] [int] Null
) on [primary] textimage_on [primary]
go

alter table [DBO]. [test_iidd] With nocheck add
constraint [pk_test_iidd] primary key clustered
(
[iidd]
) on [primary]
go

As you can see, the first table uses a globally unique identifier (guid) as the primary key, while the second table uses a normal numeric (similar to int type) data type as the primary key, here is a brief introduction to guid:

Guid, a globally unique identifier, is often used in the COM component identifier. Because it is almost impossible to generate duplicate two values, it is often used in various fields. The specific values are as follows: as shown in the A89C9547-032B-4860-ABB5-6EAEAVE934D5, you must have seen a similar string, ^ _ ^, using the newid () function in SQL Server2000 to get a unique guid

Run the following two SQL statements to insert 0.1 million statements to the two tables respectively. I am concerned about the effect of large data volumes, so don't blame me for choosing 0.1 million data records when I start.

declare @ num int
set @ num = 0
while (@ num <100000)
begin

insert into test_guid
values (
newid (),
'x222222222222222222222222 ',
getdate (),
'aaaaaaaaaaaaaaaaaaaaa',
'bbbbbbbbbbbbbbbbbbbbbbbb',
'cccccccccccccccccccccccccccccccccccccccc ',
'479c8aad-3040-4fc5-b53a-d6af085ad38a
average
',
'1',
'0'
)

set @ num = @ num + 1
end

Declare @ num int
Set @ num = 0
While (@ num <100000)
Begin

Insert into test_iidd
Values (
'X22222222222222222222222222 ',
Getdate (),
'Aaaaaaaaaaaaaaaaaaa ',
'Bbbbbbbbbbbbbbbbbbbbbb ',
'Cccccccccccccccccccccccccccccccc ',
'Ddddddddddddddd ',
'479c8aad-3040-4fc5-b53a-d6af085ad38a
479c8aad-3040-4fc5-b53a-d6af085ad38a
479c8aad-3040-4fc5-b53a-d6af085ad38a
479c8aad-3040-4fc5-b53a-d6af085ad38a
479c8aad-3040-4fc5-b53a-d6af085ad38a
479c8aad-3040-4fc5-b53a-d6af085ad38a
479c8aad-3040-4fc5-b53a-d6af085ad38a
479c8aad-3040-4fc5-b53a-d6af085ad38a
',
'1 ',
'0'
)

Set @ num = @ num + 1
End

Start the test. The test code and result are as follows:

# Test 1 (guid)
------------------ declare @ times datetime set @ times = getdate () -------------------- select * From test_guid where guid = 'a89c9547-hangzhou' or guid = 'fffa8619-BC9F-4B76-ACE8-B3324105BBDE 'or guid = 'fffc26d5-hangzhou' or guid = 'fff9fa53-E115-450A-A52D-B0AET36FF539 'or guid = 'a89c9547-032b-4860-abb5-6eaeave934d5' or guid = 'fff90a0b-CB5B-446F-81FC-CFA661D03CF8' or guid = 'fff85f4a-hangzhou' or guid = 'ffff354a-ED3E-4C3A-A033-3406F229EB34 ' order by guid DESC ------------------- select datediff (second, @ times, getdate () as Second, datediff (MS, @ times, getdate () as millisecond -------------------
0 s, 0 ms, sometimes 10 Ms

# Test 2 (iidd)
-------------------- Declare @ times datetime Set @ times = getdate () -------------------- Select * From test_iidd Where Iidd = '1' or Iidd = '2' or Iidd = '000000' or Iidd = '000000' or Iidd = '000000' or Iidd = '3' or Iidd = '8' or Iidd = '000000' Order by iidd DESC --------------------- Select datediff (second, @ times, getdate () as Second, datediff (MS, @ times, getdate () as millisecond ---------------------
0 seconds, 0 milliseconds, sometimes 10 milliseconds

We can see that the efficiency of normal select queries is not significant in the case of 0.1 million data records.

# Test 3 (guid)
-------------------- declare @ times datetime set @ times = getdate () ------------------ select count (*) from test_guid ------------------- select datediff (second, @ times, getdate () as Second, datediff (MS, @ times, getdate ()) as millisecond -------------------
29 seconds, 28793 milliseconds, poor performance!

# Test 4 (iidd)
-------------------- Declare @ times datetime Set @ times = getdate () -------------------- Select count (*) from test_iidd --------------------- Select datediff (second, @ times, getdate () as Second, datediff (MS, @ times, getdate () as millisecond ---------------------
The first running is 3 seconds, the second running is 1 second, and the third running is 0 seconds, 50 milliseconds, my God!

How is this good? guids suffer a big loss when there is no WHERE clause for aggregation operations

# Test 5 (guid)
-------------------- declare @ times datetime set @ times = getdate () ------------------ select count (*) from test_guid where Test2> '2017-06-03 21:05:33. 330 ' ------------------- select datediff (second, @ times, getdate () as Second, datediff (MS, @ times, getdate ()) as millisecond -------------------
29 seconds, 29093 milliseconds, although the query only found more than 200 pieces of data, the speed has not changed!

# Test 6 (iidd)
-------------------- declare @ times datetime set @ times = getdate () ------------------ select count (*) from test_iidd where Test2> '2017-06-03 21:05:33. 330 ' ------------------- select datediff (second, @ times, getdate () as Second, datediff (MS, @ times, getdate ()) as millisecond -------------------
the first operation is 2 seconds, and the second operation is 0 seconds, 160 milliseconds, which is a little slower than the case where there is no where

As shown in the results, the results are not ideal.

# Test (guid)
Add the Test2 column (datetime) of the table test_guid as the index column.
Run Test 3: 0 seconds, 50 milliseconds... Run Test 5: 0 seconds, 0 milliseconds. It is very obvious.

# Test 8 (iidd)
Add the Test2 column (datetime) of the table test_iidd as the index column.
Run Test 4: 0 s, 40 ms Run Test 6: 0 s, 40 ms

Test 7 and Test 8 have different return values, which may cause minor differences. This can be ignored (because I tested that the difference is very small when the return values are the same)
It can be seen that adding a time-type or Int-type index to the table with guid as the primary key can compensate for the performance loss caused by using guid as the primary key.

Summary:

Due to the time, this test is one-sided and superficial. We also hope that we can supplement and improve the deficiencies and omissions, after this test, I think I will perform more tests on performance. Please do it again.

This test only produces such superficial things, hoping that your precious time will not be wasted ^_^!

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More