Thinking about primary key problem in database design

Thinking about primary key problem in database design _ database Other

Last Update:2017-01-18 Source: Internet

Author: User

Tags datetime

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

In the design of a relational database, it is usually necessary to specify a primary key for each table, and a primary key is a property or group of attributes that uniquely identifies a row of records in a table, and a table can have only one primary key, but may have multiple candidate indexes. Because a primary key can uniquely identify a row of records, you can ensure that no errors occur when you perform data updates, deletions, and modifications. Of course, other fields can help us eliminate shared conflicts when performing these operations, not the focus of this article, and no longer repeat. primary keys, in addition to the above, often form referential integrity constraints with foreign keys to prevent data inconsistency. Therefore, the database in the design, the primary key plays a very important role. Common database primary Key selection methods are: Automatic growth type, manual growth, uniqueidentifier, joint (compound), time series + random number, "Comb (Combine)" type.

One, automatic growth type

Many database designers prefer to use an automatic growth field because it is simple to use. Automatic growth allows us to add data to the database, regardless of the primary key value, after the record is inserted, the database system will automatically assign a value to ensure that there is absolutely no duplication. If you use a SQL Server database, you can also use the @ @IDENTITY global variable to get the primary key value of the system allocation after the record is inserted.
While the automatic growth field saves us a lot of tedious work, there is a potential problem with it, which is that in data buffering mode, it is difficult to fill in the value of the primary key and the foreign key in advance. Suppose there are two tables of main and auxiliary:

Order (OrderID, OrderDate) Orders form
Orderdetial (OrderID, LineNum, ProductID, price) Order schedule

OrderID in the Order table are fields that grow automatically. Suppose we now need to enter an order, including inserting a record in the Orders table and inserting several records in the OrderDetail table. Because the OrderID in the Order table is a field that is automatically growing, we cannot know the value of the record before it is formally inserted into the database, and only after the update can we see what value the database assigns to it. This can cause the following contradictions to occur:
First, in order to add the correct value to the OrderDetail OrderID field, you must first update the order table to get the OrderID value that the system assigns to it, and then use this OrderID to populate the OrderDetail column of the OrderID table. Finally, update the Oderdetail table. However, to ensure data consistency, the order and OrderDetail must be updated in transaction mode, that is, either the two tables both update successfully or fail all at the same time, and obviously they are contradictory.
Second, when we need to replicate data between multiple databases (SQL Server's data distribution and subscription mechanisms allow us to replicate data between libraries), auto-growing fields can cause primary key conflicts and loss of table association relationships when data is merged. When you imagine that an order table in a database replicates a database to an OrderID in another library, should it automatically grow? If automatic growth, its child table Orderdetial Association relationship will be lost, if not grow and existing data primary key duplication, is not very contradictory?
Again, the value of the increment is the need to maintain a global data value in the system, which increments the value each time the data is inserted. In a concurrent environment that produces a unique identity, each incremental value must be unlocked for this global value to guarantee the uniqueness of the increment. Cause concurrent bottleneck, reduce query performance.
And when the data table is large enough or frequent changes and inserts cause the primary key type value to be out of range, this is rarely encountered, but it is also a problem we must consider when designing the datasheet

Two, manual growth of the field

Now that the automatic growth field can be such a hassle, we might consider using a manually-grown field, which means that the value of the primary key needs to be maintained on its own, usually by creating a separate table to store the current primary key value. For the sake of the narrative, the above example is still used to illustrate the new table called Intkey, containing two fields, KeyName and keyvalue. Just like a Hashtable, give a keyname, you can know what the current keyvalue is, and then manually implement key value data increment. In SQL Server, you can write a stored procedure to automate the process of taking a key value. The code is as follows:

CREATE PROCEDURE [Getkey]
@KeyName Char (10),
@KeyValue int OUTPUT
As
UPDATE intkey SET @KeyValue = keyvalue = keyvalue + 1 WHERE keyname = @KeyName
Go

In this way, by invoking the stored procedure, we can get the latest key value to ensure that no duplication occurs. If you set the OrderID field to a manual growth field, our program can be implemented by calling the stored procedure first, obtaining a OrderID, and then using the OrderID to populate the order table with the OrderDetail table, and then update the two tables under the transaction mechanism.
When you use a manual growth field as a primary key to replicate data between databases, you can ensure that key value conflicts do not occur during data consolidation, as long as you assign different primary key value segments to different tables of data. However, using a manual growth field increases the burden on the network, and you must obtain the current primary key value by adding a database access, which increases the load on the network and the database, which can be a big disadvantage when you are in a slow or disconnected network environment. At the same time, the manual maintenance of the primary key also consider the concurrency of various factors such as conflict, which will increase the complexity of the system.

Iii. Use of uniqueidentifier

SQL Server provides us with a uniqueidentifier data type and provides a build function newid () that uses NEWID () to generate a unique uniqueidentifier. UniqueIdentifier occupies 16 bytes in the database, the probability of duplication is almost 0, known as the world's only identity. We often see a similar 768427bf-9b37-4776-97ca-000365e160d5 or {45F0EB02-0727-4F2E-AAB5-E8AEDEE0CEC5} when there is an error in the registry or a Windows program that requires debugging The thing is actually a uniqueidentifier,windows use it to make COM components and interface identification to prevent duplication. In. NET, uniqueidentifier is called a GUID (Global Unique Identifier). You can use the following command to generate a GUID in C #:
Guid u = System.Guid.NewGuid ();
for the order and OrderDetail program mentioned above, if you choose uniqueidentifier as the primary key, we can avoid the problem of increasing network roundtrip mentioned above. Use the program to generate the GUID directly to populate the primary key, regardless of whether there will be duplicates. But the uniqueidentifier field also has a serious flaw: first, it is 16 bytes long, 4 times times the length of an integer, and consumes a lot of storage space. More seriously, uniqueidentifier generation has no regularity, that is, unordered, and it is a time-consuming operation to build an index on it (most databases have indexes on the primary key). Someone has done experiments, when the data table records larger, at different levels of data to insert the same amount of data, the use of uniqueidentifier data to master keys than the use of integer data slow, and has not taken into account the table association, for efficiency considerations, Avoid using the uniqueidentifier database as a primary key value as much as possible, but as modern computers compute faster, using uniqueidentifier primary keys in small and medium projects is also an option.

Use business fields to combine primary keys

Based on the database tools such as Dephi and PowerBuilder, the Database Designer of C/s system is developed, and it is customary to combine the fields of business meaning into composite primary key to make the data table primary key. The use of business primary key of course has its inherent benefits, in general, the database system in the default conditions to establish a clustered index, and this clustered index is based on the primary key ascending order, when the amount of data is relatively small, we do not feel this difference, when the amount of data is large, this is based on the primary key definition of the advantages of the clustered index appears, This allows the data table to accurately confirm the data inserted or updated disk physical location each time the data is accessed, reduce the head addressing time, improve database performance, and can guarantee the integrity of the data in the business sense, increase the reliability of the program. However, based on the joint index of the business field, when the Business field is selected, it will occupy more disk space. And the index page consumes more memory pages, resulting in lower query hit rates, and the use of business primary key, when it comes to modifying primary key data, to record the new and original values in the programming process of the relational table, When updating, the new value and the original value are compared, which increases the complexity of the programming.

Five, time series + random number primary key

It is an effective way to solve the problem of primary key by using the time that is accurate to millisecond or even sodium second, and a randomly generated two-digit key, such as 200911282311528+ two-bit random number. This produces the primary key not only avoids the uniqueidentifier Type field key when the disorder, and can effectively avoid the automatic growth of the primary key, such as replication and data import trouble. But in the use of many users of the network real-time system, in time and space still can not guarantee the uniqueness of the problem.

Vi. Use the "Comb (Combine)" Type

Since the top five primary key type selection strategies have their own shortcomings, then in the end there is no good way to solve it? The answer is yes. By using the comb type (there is no comb type in the database, which is designed by Jimmy Nilsson in his "The cost of GUIDs as Primary Keys"), you can use the golden mean between the numerous primary key strategies above. Find a good balance point.

The basic design idea of the comb data type is this: since uniqueidentifier data is inefficient because of the lack of regularity, which affects the performance of the system, can we keep the first 10 bytes of uniqueidentifier by the way of combination, The latter 6 bytes represent the time of the GUID generation (DateTime), so that we combine the time information with the uniqueidentifier to increase the indexing efficiency while preserving the uniqueness of the uniqueidentifier. Some people may worry that uniqueidentifier reduced to 10 bytes will result in duplication of data, in fact, do not worry, the 6-byte time precision can reach 1/300 seconds, two comb type data exactly the same probability is in this 1/ It is almost impossible that the first 10 bytes of the two GUIDs generated in 300 seconds are exactly the same. Using SQL commands in SQL Server to implement this idea is:

DECLARE @aGuid uniqueidentifier
SET @aGuid = cast (CAST (NEWID () as BINARY (10))
+ CAST (GETDATE () as BINARY (6)) as uniqueidentifier)

After testing, the use of comb key than the use of int master key, in the search, insert, UPDATE, Delete, and other operations are still slow, but more than the type of unidentifier faster. In addition to using stored procedures to implement comb data, we can also use C # to generate comb data so that all primary key generation work can be done on the client.

The C # code is as follows:

Copy Code code as follows:

//================================================
/**////<summary>
Return GUID for database operations, specific time code can improve retrieval efficiency
</summary>
<returns>comb (GUID-time mixed) type GUID data </returns>
public static Guid Newcomb ()
{
byte[] Guidarray = System.Guid.NewGuid (). Tobytearray ();
DateTime basedate = new DateTime (1900,1,1);
DateTime now = DateTime.Now;
Get the days and milliseconds which'll be used to build the byte string
TimeSpan days = new TimeSpan (now. Ticks-basedate.ticks);
TimeSpan msecs = new TimeSpan (now. Ticks-(New DateTime (now). Year, now. Month, now. Day). Ticks));
Convert to a byte array
Note This SQL Server is accurate to 1/300th the a millisecond so we divide by 3.333333
byte[] Daysarray = Bitconverter.getbytes (days. Days);
byte[] Msecsarray = bitconverter.getbytes (Long) (msecs. totalmilliseconds/3.333333));
Reverse the bytes to match SQL Servers ordering
Array.reverse (Daysarray);
Array.reverse (Msecsarray);
Copy the bytes into the GUID
Array.copy (Daysarray, Daysarray.length-2, Guidarray, guidarray.length-6, 2);
Array.copy (Msecsarray, msecsarray.length-4, Guidarray, guidarray.length-4, 4);
return new System.Guid (Guidarray);
}
//================================================
/**////<summary>
Generate time information from a GUID returned by SQL SERVER
</summary>
<param name= "GUID" > Comb </param> containing time information
<returns> Time </returns>
public static DateTime Getdatefromcomb (System.Guid Guid)
{
DateTime basedate = new DateTime (1900,1,1);
byte[] Daysarray = new Byte[4];
byte[] Msecsarray = new Byte[4];
Byte[] Guidarray = GUID. Tobytearray ();
Copy the date parts of the GUID to the respective byte arrays.
Array.copy (Guidarray, Guidarray.length-6, Daysarray, 2, 2);
Array.copy (Guidarray, guidarray.length-4, Msecsarray, 0, 4);
Reverse the arrays to put them into the appropriate order
Array.reverse (Daysarray);
Array.reverse (Msecsarray);
Convert the bytes to INTs
int days = Bitconverter.toint32 (Daysarray, 0);
int msecs = Bitconverter.toint32 (msecsarray, 0);
DateTime date = Basedate.adddays (days);
Date = date. Addmilliseconds (msecs * 3.333333);
return date;
}

The above six kinds of primary key selection strategy, I think the use of "comb (Combine)" Type master key is a more appropriate primary key application strategy, but in the actual use of the process should be based on objective practice, due to the choice of the appropriate primary key, must not mechanically, self-defeating.

Reference documents:

1, "system analyst Course" Zhang Yusheng Editor
2, "Chinese version of SQL Server 2000 development and Management application example" Jiangjian Editor
3, in the database using the self-increment field and the GUID field primary key performance comparison, the author is not known
4, "Small discussion database primary key selection strategy" author unknown

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More