Summary of relational database primary key

Source: Internet
Author: User

When designing a relational database, you typically specify a primary key for each table, which is a property or attribute group that uniquely identifies a row of records in a table, a table can have only one primary key, but can have multiple candidate indexes. Because a primary key uniquely identifies a row of records, you can ensure that no errors occur when you perform data updates, deletions, and modifications. Of course, other fields can help us eliminate sharing conflicts when doing these operations, not the focus of this article, and not repeat them. The primary key, in addition to the above, often forms referential integrity constraints with foreign keys to prevent inconsistent data. Therefore, the database in the design, the primary key played a very important role. Common database primary Key selection methods are: Auto-growth, manual growth, uniqueidentifier, combined (compound), time series + random number, "Comb (Combine)" type.
One, automatic growth type
Many database designers prefer to use the autogrow field because it is simple to use. Autogrow allows us to add data to a database without taking into account the value of the primary key, and when the record is inserted, the database system automatically assigns it a value to ensure that there is absolutely no duplication. If you use a SQL Server database, we can also use the @ @IDENTITY global variable to get the primary key value assigned by the system after the record is inserted.
Although the autogrow field saves us a lot of tedious work, the potential problem with using it is that it is difficult to pre-populate the values of primary and foreign keys in data buffering mode. Assume that there are two main tables:
Order (OrderID, OrderDate)? Order Form
Orderdetial (OrderID, LineNum, ProductID, price)? Order Schedule
The OrderID in the Order table is an auto-growing field. Suppose we need to enter an order now, including inserting a record in the order table and inserting several records into the OrderDetail table. Because the OrderID in the order table is an auto-growing field, we cannot know the value of a record until it is formally inserted into the database, and only after the update can you tell what value the database assigns to it. This causes the following contradictions to occur:?
First, in order to be able to add the correct value to the OrderDetail OrderID field, you must first update the order table to get the OrderID value assigned to the system, and then populate the OrderDetail column of the OrderID table with this OrderID. Finally, update the Oderdetail table. However, in order to ensure data consistency, order and OrderDetail must be in transactional mode at the time of the update, that is, either the two tables are simultaneously updated successfully or all fail, and obviously they are conflicting.
Second, when we need to replicate data between multiple databases (SQL Server's data distribution, subscription mechanism allows us to replicate data between libraries), the auto-grow field may cause primary key conflicts and loss of table associations when data is merged. When you imagine that the order table in one database replicates the database to an order table in another library, should OrderID automatically grow? If the auto-growth, its child table Orderdetial Association relationship will be lost, if it does not grow and the existing data primary key duplicates, is not very contradictory??
Again, the value of the self-increment is the need to maintain a global data value in the system, each time the data is inserted, the value is incrementally evaluated. In a concurrency environment where unique identities are generated, each increment value must be unlocked for this global value to guarantee the uniqueness of the increment. Cause concurrency bottlenecks and reduce query performance.
??? Also, when the data table is large enough or frequent changes and insertions cause the primary key type value to go out of scope, this situation is rarely encountered, but it is also a problem that we must consider when we design the datasheet

Primary key conflicts are prone to primary key conflicts in real development and duplicate values occur in primary keys
For example, T_user T_admin has a primary key, which is the ID auto-increment mode.
Insert into T_user (ID)
Select ID from T_admin
The insert data is not successful when the primary key is 7 in both tables because of a primary key conflict
Solutions
UPDATE t_admin SET Id=-id
WHERE ID is not NULL and id>0

Commit;

Second, the Manual Growth Type field
Since the auto-grow field can be such a hassle, we might consider using a manually-grown field, which means that the value of the primary key needs to be maintained by itself, and usually a separate table is required to store the current primary key value. For the sake of narrative convenience still using the above example to illustrate, a new table called Intkey, contains two fields, KeyName and keyvalue. Just like a HashTable, to a keyname, you can know what the current keyvalue is, and then manually implement the key value data increment. In SQL Server, you can write a stored procedure that allows you to automate the process of fetching key values. The code is as follows:
CREATE PROCEDURE [GetKey]?
@KeyName Char (10),?
@KeyValue int OUTPUT?
As?
UPDATE intkey SET @KeyValue = KeyValue = KeyValue + 1 WHERE KeyName = @KeyName?
GO
Thus, by invoking the stored procedure, we can obtain the latest key value to ensure that no duplicates occur. If the OrderID field is set to a manual growth field, our program can be implemented in the following steps: First call the stored procedure, get a OrderID, then use this OrderID to populate the order table with the OrderDetail table, and finally update the two tables under the transaction mechanism.
Using the manual growth field as the primary key when replicating data between databases, you can ensure that there are no key-value conflicts during data merge, as long as you assign different primary key-value segments to different data tables. However, the use of a manual growth field increases the burden on the network, and you must obtain the current primary key value by increasing the database access, which increases the load on the network and the database, which can be a major disadvantage when in a low-speed or disconnected network environment. At the same time, the manual maintenance of the primary key to consider the concurrency conflicts and other factors, which will increase the complexity of the system.
Third, the use of uniqueidentifier
SQL Server gives us the uniqueidentifier data type and provides a build function newid () that uses NEWID () to generate a unique uniqueidentifier. UniqueIdentifier occupies 16 bytes in the database, with a repetition probability of almost 0, claiming to be the globally unique identity. We often see something like 768427bf-9b37-4776-97ca-000365e160d5 or {45F0EB02-0727-4F2E-AAB5-E8AEDEE0CEC5} from the registry or the Windows program when you need to debug the error is actually a uniqueidentifier,windows used to make COM components and interface identifiers to prevent duplication. In. NET uniqueidentifier is called a GUID (Global Unique Identifier). In C #, you can use the following command to generate a GUID:?
Guid u = System.Guid.NewGuid ();?
For the above mentioned order and OrderDetail program, if you choose uniqueidentifier as the primary key, we can avoid the above mentioned increase the network roundtrip problem. Populate the primary key with the GUID directly generated by the program, regardless of whether duplicates occur. But the uniqueidentifier field also has a serious flaw: first, it is 16 bytes long, is 4 times times longer than an integer, and consumes a lot of storage space. More seriously, the generation of uniqueidentifier is irregular, that is, unordered, and it is a time-consuming operation to index it (most databases have indexes on the primary key). Some people have done experiments, when the data table record is relatively large, the same amount of data at different levels of data, the use of uniqueidentifier-type data is slower than the use of integer data, and has not taken into account the table association, for efficiency, as far as possible to avoid the use of The uniqueidentifier database is the primary key value, but as modern computers become more and more computationally fast, it is an option to use the uniqueidentifier primary key in small and medium-sized projects.
Iv. using the Business field Federated primary key
????
Database Designer based on database tools such as Dephi and PowerBuilder to develop C/s system, it is customary to combine a business-meaningful field into a composite primary key to make a data table primary key. The use of business primary key of course has its inherent benefits, in general, the database system will be built under the default conditions clustered index, and this clustered index based on the primary key in ascending order, when the amount of data for the hour, we do not feel this difference, when the amount of data is large, this is based on the primary key definition of clustering index of the advantages This allows the data table to accurately confirm the data inserted or updated disk physical location According to the index every time the data is accessed, reduce the head addressing time, improve the database performance, and can guarantee the integrity of data in business sense and increase the reliability of the program. However, the joint index based on the business field, when the business fields are used for a long time will occupy more disk space, and index pages will occupy more memory pages, resulting in reduced query hit ratio, and the use of business primary key, when it comes to the primary key data modification, in the programming process to record the new value and the original value of the relational table, In addition to the new value and the original value of the comparison, to increase the complexity of programming.
Five, time series + random number primary key
The use of accurate to millisecond or even sodium seconds and a randomly generated two-digit key, such as 200911282311528+ two-bit random number, is an effective way to solve the primary key problem. The resulting primary key avoids the disorder of the uniqueidentifier field, and avoids the trouble of auto-growing primary keys such as replication and data import. However, in the use of a large number of users of the network real-time system, in time and space is still not guaranteed the uniqueness of the problem.
Vi. use of "comb (Combine)" type
Since the above five types of primary key type selection strategy have their own shortcomings, then there is no good way to solve it? The answer is yes. By using the comb type (there is no comb type in the database, it was designed by Jimmy Nilsson in his "The cost of GUIDs as Primary keys"), you can use the mean between the numerous primary key strategies above Find a good balance point.
The basic design of the comb data type is this: since uniqueidentifier data due to the lack of regularity can be caused by inefficient indexing, affecting the performance of the system, then we can be combined to retain the first 10 bytes of uniqueidentifier, Use the latter 6 bytes to represent the time (DateTime) of the GUID generation, so that we combine the time information with the uniqueidentifier to improve the efficiency of the index by increasing the order while preserving the uniqueness of the uniqueidentifier. Maybe someone will worry that uniqueidentifier reduced to 10 bytes will result in duplication of data, in fact, do not worry, the latter 6 bytes of time precision can reach 1/300 seconds, two comb type data is exactly the same possibility is in this 1/ The first 10 bytes of the two GUIDs generated in 300 seconds are exactly the same, which is almost impossible! Using SQL commands in SQL Server to implement this idea is:
DECLARE @aGuid uniqueidentifier?
SET @aGuid = cast (CAST (NEWID () as BINARY (10))?

    • CAST (GETDATE () as BINARY (6)) as uniqueidentifier)
      After testing, the use of comb key than the use of int key, in the search, insert, UPDATE, delete and other operations are still slow, but more than unidentifier type faster. In addition to using stored procedures to implement comb data, we can also use C # to generate comb data so that all primary key generation work can be done on the client.
      C # code is as follows:?
      Copy the code code as follows:

//================================================?
/<summary>?
The return GUID is used for database operations, and a specific time code can improve retrieval efficiency?
</summary>?
<returns>comb (GUID and time-mixed) type GUID data </returns>?
public static Guid Newcomb ()?
{?
byte[] Guidarray = System.Guid.NewGuid (). Tobytearray ();?
DateTime basedate = new DateTime (1900,1,1);?
DateTime now = DateTime.Now;?
Get the days and milliseconds which would be is used to build the byte string?
TimeSpan days = new TimeSpan (now. Ticks-basedate.ticks);?
TimeSpan msecs = new TimeSpan (now. Ticks-(New DateTime (now). Year, now. Month, now. Day). Ticks));
Convert to a byte array?
Note that SQL Server was accurate to 1/300th of a millisecond so we are divide by 3.333333?
byte[] Daysarray = Bitconverter.getbytes (days. Days);?
byte[] Msecsarray = Bitconverter.getbytes ((long) (msecs. totalmilliseconds/3.333333));?
Reverse the bytes to match SQL Servers ordering?
Array.reverse (Daysarray);?
Array.reverse (Msecsarray);?
Copy the bytes into the GUID?
Array.copy (Daysarray, Daysarray.length-2, Guidarray, guidarray.length-6, 2);?
Array.copy (Msecsarray, msecsarray.length-4, Guidarray, guidarray.length-4, 4);?
return new System.Guid (Guidarray);?
}?
//================================================?
/
<summary>?
Generate time information from the GUID returned by SQL SERVER?
</summary>?
<param name= "GUID" > Comb </param&gt with time information;?
<returns> time </returns>?
public static DateTime Getdatefromcomb (System.Guid Guid)?
{?
DateTime basedate = new DateTime (1900,1,1);?
byte[] Daysarray = new byte[4];?
byte[] Msecsarray = new byte[4];?
Byte[] Guidarray = GUID. Tobytearray ();?
Copy the date parts of the GUID to the respective byte arrays.?
Array.copy (Guidarray, Guidarray.length-6, Daysarray, 2, 2);?
Array.copy (Guidarray, guidarray.length-4, Msecsarray, 0, 4);?
Reverse the arrays to put them into the appropriate order?
Array.reverse (Daysarray);?
Array.reverse (Msecsarray);?
Convert the bytes to ints?
int days = Bitconverter.toint32 (Daysarray, 0);?
int msecs = Bitconverter.toint32 (msecsarray, 0);?
DateTime date = Basedate.adddays (days);?
Date = date. Addmilliseconds (msecs * 3.333333);?
return date;?
}?
The above six kinds of primary key selection strategy, I think the use of "comb (Combine)" Type key is the appropriate primary key application strategy, but in the actual use of the process according to objective practice, due to the matter of choosing the appropriate primary key, must not mechanically, self-defeating.

Summary of relational database primary key

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.