[Database] About primary key and foreign key

Last Update:2017-05-17 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article is referenced from:http://blog.csdn.net/bingqingsuimeng/article/details/51595560

First, what is the primary key, foreign key:

There are several attributes in a record in a relational database, which can be a primary key if one of the attribute groups (note is a group) uniquely identifies a record.
Like what
Student Table (School number, name, gender, Class)
Each student's school number is unique, the study number is a primary key
Curriculum (course number, course name, credits)
Where the course number is unique, the course number is a primary key
Score Table (School number, course number, score)
A single attribute in the score table cannot uniquely identify a record, and the combination of the study number and course number uniquely identifies a record, so the attribute group for the study number and course number is a primary key

The school number in the score table is not the primary key of the score table, but it corresponds to the student's number, and the student's table is the primary key of the student table, then the student number in the score table is the foreign key of the students ' table.

The course number in the results table is the foreign key of the curriculum.

Defining primary and foreign keys is primarily to maintain the integrity of the relational database, summarizing:
1. A primary key is a unique identifier that determines a record, for example, a record that includes an identity plus, name, and age.

The ID number is the only person who can identify you, others may have duplicates, so the ID number is the primary key.
2. A foreign key is used to associate with another table. is a field that determines the record of another table, which is used to maintain data consistency.

For example, a field in a table is the primary key of table B, so he can be a foreign key to table A.

Ii. differences between primary key, foreign key, and index

What is the difference between a primary key, a foreign key, and an index?

	Primary key	FOREIGN key	Index
Defined:	Uniquely identifies a record, cannot be duplicated, is not allowed to be empty	The foreign key of the table is the primary key of the other table, the foreign key can have duplicate, can be a null value	The field does not have duplicate values, but can have a null value
Role:	To ensure data integrity	Used to establish contact with other tables.	is to improve the speed of query sorting
Number:	The primary key can only have one	A table can have more than one foreign key	A table can have multiple unique indexes

What is the difference between a clustered index and a nonclustered index?

A clustered index must be a unique index. But a unique index is not necessarily a clustered index.

Clustered indexes, where the data is stored directly in the index page, and the non-clustered index is indexed on the index page, which points to the data of the dedicated data page.

Design principle of primary key and foreign key in database

Primary keys and foreign keys are binders that organize multiple tables into an effective relational database. The design of primary keys and foreign keys has a decisive impact on the performance and availability of the physical database.

The database schema must be converted from a theoretical logical design to an actual physical design. The structure of the primary key and the foreign key is the crux of the design process. Once the designed database is used in production environments, it is difficult to modify these keys, so it is necessary and worthwhile to design the primary and foreign keys in the development phase.

Primary key:

The relational database relies on the primary key---it is the cornerstone of the database physical pattern.

There are only two uses of the primary key at the physical level:

1. Uniquely identify a row.

2. As an object that can be effectively referenced by a foreign key.

Based on these two uses, here are some of the principles that I followed when I designed the primary key for the physical plane:

1. The primary key should not be meaningful to the user. If a user sees data in a connection table that represents a many-to-many relationship and complains that it is useless, it proves that its primary key is well designed.

2. The primary key should be single-column to improve the efficiency of the connection and filtering operations.

Note: People who use composite keys often have two reasons to justify themselves, both of which are wrong. One is that the primary key should have practical significance, however, making the primary key meaningful is simply a convenience to artificially destroy the database. The second is that using this method can use two foreign keys as the primary key in the join table that describes many-to-many relationships, and I also oppose this practice, because the composite primary key often leads to bad foreign keys, that is, when the join table becomes another main table from the table, and according to the first method above becomes part of the table primary key, It is also possible for the table to become the other primary table from the table, and its primary key may become part of the other from the table primary key, so pass on, the more the back of the table, its primary key will contain more columns.

3. Never update the primary key. In fact, there is no reason to update a primary key because it has no other purpose than to uniquely identify a row. If the primary key needs to be updated, the principle that the primary key should not be meaningful to the user is violated.

Note: This principle does not apply to data that is often required for data consolidation when data conversion or multi-database merging.

4. The primary key should not contain dynamically changing data such as timestamps, creation time columns, modified time columns, and so on.

5. The primary key should be automatically generated by the computer. If a person is to intervene in the creation of a primary key, it will have a meaning other than a single row of identifiers. Once this boundary is crossed, it is possible to assume the motivation to modify the primary key, so that the key means used by the system to link record lines and manage record lines will fall into the hands of those who do not understand the design of the database.

Iv. database primary Key selection policy

When we build the database, we need to specify a primary key for each table, which is a property or attribute group that uniquely identifies a row in the table, a table can have only one primary key, but there can be multiple candidate indexes. Because a primary key uniquely identifies a row of records, you can ensure that you do not have pigtailed errors when you perform data updates and deletions. Of course, other fields can help us eliminate sharing conflicts when we do these things, but it's not discussed here. The primary key, in addition to the above, often forms referential integrity constraints with foreign keys to prevent inconsistent data. Therefore, the database in the design, the primary key played a very important role.

Common database primary Key selection methods are:

· Auto Grow Field

· Manually Grow Fields

· uniqueidentifier

· "Comb (Combine)" type

1 Auto-Grow field

Many database designers prefer to use the autogrow field because it is simple to use. The autogrow field allows us to add data to the database regardless of the primary key value, and when the record is inserted, the database system automatically assigns it a value to ensure that there is absolutely no duplication. If you use a SQL Server database, we can also use the @ @IDENTITY global variable to get the primary key value assigned by the system after the record is inserted.

Although the auto-grow field saves us a lot of tedious work, the potential problem with it is that it is difficult to pre-populate the values of primary and foreign keys in data buffering mode. Suppose there are two tables:

Order (OrderID, OrderDate)
Orderdetial (OrderID, LineNum, ProductID, Price)

The OrderID in the order table is an auto-growing field. Now we need to enter an order, including inserting a record in the order table and inserting several records into the OrderDetail table. Because the OrderID in the order table is an auto-growing field, we cannot know the value of a record until it is formally inserted into the database, and only after the update can you tell what value the database assigns to it. This can cause the following contradictions to occur:

First, in order to be able to add the correct value to the OrderDetail OrderID field, you must first update the order table to get the OrderID value assigned to the system, and then populate the OrderDetail table with this OrderID. Finally, update the Oderdetail table. However, to ensure data consistency, order and OrderDetail must be updated at the same time under transaction protection, which ensures that both tables are more successful at the same time. It is clear that they contradict each other.

In addition, when we need to replicate data between multiple databases (SQL Server's data distribution, subscription mechanism allows us to replicate data between libraries), autogrow fields can cause primary key conflicts when data is merged. When you imagine that the order table in one database replicates the database to an order table in another library, should OrderID automatically grow?

Ado. NET allows us to set a field in the DataSet as an autogrow field, but remember that this autogrow field is just a placeholder, and when the database is updated, the values generated by the database automatically supersede the values assigned by ADO. Therefore, in order to prevent users from misunderstanding, we recommend that you set the auto-growth initial value and the increment in ADO to 1. In addition In ADO, we can create DataRelation for two tables, so that when two tables with cascading relationships are updated, the value of the other table's corresponding key changes automatically when one table is updated, which greatly reduces the hassle of the auto-grow field when there are two inter-table updates that have cascading relationships.

2 Manual Growth Type field

Since the auto-grow field can be such a hassle, we might consider using a manually-grown field, which means that the value of the primary key needs to be maintained by itself, and usually a separate table is required to store the current primary key value. Also in the example above, this time we create a new table called Intkey, which contains two fields, KeyName and keyvalue. Just like a Hashtable, to a keyname, you can know what the current keyvalue is, and then manually implement the key value data increment. In SQL Server, you can write a stored procedure that allows you to automate the process of fetching key values. The code is as follows:

CREATE PROCEDURE [GetKey]

@KeyName Char (10),
@KeyValue int OUTPUT

As
UPDATE intkey SET @KeyValue = KeyValue = KeyValue + 1 WHERE KeyName = @KeyName
Go

Thus, by invoking the stored procedure, we can obtain the latest key value to ensure that no duplicates occur. If the OrderID field is set to a manual growth field, our program can be implemented by calling the stored procedure first, obtaining a OrderID, then populating the order table with the OrderDetail table using this OrderID, and finally updating the two tables under transaction protection.

Using the manual growth field as the primary key when replicating data between databases, you can ensure that there are no key-value conflicts during data merge, as long as we assign different primary key-value segments to different databases. However, using a manual growth field increases the roundtrip of the network, and we must obtain the current primary key value by adding a database access, which increases the load on the network and the database, which can be a major disadvantage when in a low-speed or disconnected network environment. At the same time, the manual maintenance of the primary key to consider the concurrency conflicts and other factors, which will increase the complexity of the system.

3 using uniqueidentifier

SQL Server gives us the uniqueidentifier data type and provides a build function newid () that uses NEWID () to generate a unique uniqueidentifier. UniqueIdentifier occupies 16 bytes in the database, the probability of repetition is so small that it can be considered as 0. We often see from the registry similar

{45F0EB02-0727-4F2E-AAB5-E8AEDEE0CEC5}

is actually a uniqueidentifier,windows used to make COM components and interface identifiers to prevent duplication. In. NET pipe uniqueidentifier is called a GUID (Global Unique Identifier). In C #, you can use the following command to generate a GUID:

Guid u = System.Guid.NewGuid ();

For the above mentioned order and OrderDetail program, if you choose uniqueidentifier as the primary key, we can avoid the above mentioned increase the network roundtrip problem. Populate the primary key with the GUID directly generated by the program, regardless of whether duplicates occur.

The uniqueidentifier field also has a serious flaw: first, it is 16 bytes long, is 4 times times the integer, and consumes a lot of storage space. More seriously, the generation of uniqueidentifier is irregular, and it is a time-consuming operation to index it (most databases have indexes on the primary key). Some people have done experiments, inserting the same amount of data, using uniqueidentifier data is slower than using integer data, so, for efficiency reasons, avoid using the uniqueidentifier database as the primary key key value.

4 Using the "Comb (Combine)" type

Since the above three types of primary key type selection strategy have their own shortcomings, then there is no good way to solve it? The answer is yes. By using the comb type (there is no comb type in the database, it was designed by Jimmy Nilsson in his article "The cost of GUIDs as Primary Keys"), you can find a good balance between the three.

The basic design of the comb data type is this: since uniqueidentifier data due to the lack of regularity can be caused by inefficient indexing, affecting the performance of the system, then we can be combined to retain the first 10 bytes of uniqueidentifier, Use the latter 6 bytes to represent the time (DateTime) of the GUID generation, so that we combine the time information with the uniqueidentifier to improve the efficiency of the index by increasing the order while preserving the uniqueness of the uniqueidentifier. Maybe someone will worry that uniqueidentifier reduced to 10 bytes will result in duplication of data, in fact, do not worry, the latter 6 bytes of time precision can reach 1/300 seconds, two comb type data is exactly the same possibility is in this 1/ The first 10 bytes of the two GUIDs generated in 300 seconds are exactly the same, which is almost impossible! Using SQL commands in SQL Server to implement this idea is:

DECLARE @aGuid uniqueidentifier

SET @aGuid = cast (CAST (NEWID () as BINARY (10))
+ CAST (GETDATE () as BINARY (6)) as uniqueidentifier)

After testing, the use of comb key than the use of int key, in the search, insert, UPDATE, delete and other operations are still slow, but more than unidentifier type faster. The test data can be consulted in my July 21, 2004 essay.

In addition to using stored procedures to implement comb data, we can also use C # to generate comb data so that all primary key generation work can be done on the client. The C # code is as follows:

//================================================================
<summary>
Returns a GUID for database operations, and a specific time code to improve retrieval efficiency
</summary>
<returns>comb (GUID and time-mixed) type GUID data </returns>
public static Guid Newcomb ()
{
byte[] Guidarray = System.Guid.NewGuid (). Tobytearray ();
DateTime basedate = new DateTime (1900,1,1);
DateTime now = DateTime.Now;
Get the days and milliseconds which'll be used to build the byte string
TimeSpan days = new TimeSpan (now. Ticks-basedate.ticks);
TimeSpan msecs = new TimeSpan (now. Ticks-(New DateTime (now). Year, now. Month, now. Day). Ticks));

Convert to a byte array
Note that SQL Server was accurate to 1/300th of a millisecond so we divide by 3.333333
byte[] Daysarray = Bitconverter.getbytes (days. Days);
byte[] Msecsarray = Bitconverter.getbytes ((long) (msecs. totalmilliseconds/3.333333));

Reverse the bytes to match SQL Servers ordering
Array.reverse (Daysarray);
Array.reverse (Msecsarray);

Copy the bytes into the GUID
Array.copy (Daysarray, Daysarray.length-2, Guidarray, guidarray.length-6, 2);
Array.copy (Msecsarray, msecsarray.length-4, Guidarray, guidarray.length-4, 4);

return new System.Guid (Guidarray);
}

//================================================================
<summary>
Generate time information from the GUID returned by SQL SERVER
</summary>
<param name= "GUID" > Comb with time information </param>
<returns> Time </returns>
public static DateTime Getdatefromcomb (System.Guid Guid)
{
DateTime basedate = new DateTime (1900,1,1);
byte[] Daysarray = new Byte[4];
byte[] Msecsarray = new Byte[4];
Byte[] Guidarray = GUID. Tobytearray ();

Copy the date parts of the GUID to the respective byte arrays.
Array.copy (Guidarray, Guidarray.length-6, Daysarray, 2, 2);
Array.copy (Guidarray, guidarray.length-4, Msecsarray, 0, 4);

Reverse the arrays to put them into the appropriate order
Array.reverse (Daysarray);
Array.reverse (Msecsarray);

Convert the bytes to INTs
int days = Bitconverter.toint32 (Daysarray, 0);
int msecs = Bitconverter.toint32 (msecsarray, 0);

DateTime date = Basedate.adddays (days);
Date = date. Addmilliseconds (msecs * 3.333333);

return date;

}

[Database] About primary key and foreign key

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

[Database] About primary key and foreign key

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support