Primary keys, foreign keys, and indexes

Source: Internet
Author: User
Tags one table
primary key foreign keys and indexes(This article is transferred from http://www.cnblogs.com/longyi1234/archive/2010/03/24/1693738.html)

First, what is primary key, foreign key: a record in a relational database has several attributes, if one of the attribute groups (note is a group) can uniquely identify a record, the attribute group can be a primary key
Like what
Student form (school number, name, gender, Class)
Each student's number is unique, and the number is a primary key.
Schedule (course number, course name, credits)
Where the course number is unique and the course number is a primary key
Score Sheet (School number, course number, results)
A single attribute in the score list cannot uniquely identify a record, the combination of the number and course number can uniquely identify a record, so the attribute group for the number and course number is a primary key

The study number in the score table is not the primary key of the score table, but it corresponds to the student's number in the chart, and the student's form is the primary key of the student's table, then the school number in the score table is the foreign key of the student's table.

The course number in the empathy form is the foreign key to the timetable.

The primary key and foreign keys are defined primarily to maintain the integrity of the relational database, summarizing:
A primary key is a unique identifier that determines a record, for example, a record includes an identity plus, name, and age. The ID number is the only one who can be sure of you, and others may have duplicates, so the ID number is the primary key.
A foreign key is used to associate with another table. is a field that determines the record of another table to keep the data consistent. For example, a field in Table A, which is the primary key of table B, can be a foreign key to table A. Second, primary key, foreign key and index of the difference collection

The difference between a primary key, a foreign key, and an index.

Primary key

FOREIGN key

Index

Defined:

Uniquely identifies a record, cannot have duplicates, is not allowed to be empty

The foreign key of the table is the primary key of another table, the foreign key can have duplicate, can be a null value

The field has no duplicate values, but can have a null value

Role:

To ensure data integrity

Used to establish contact with other tables.

is to improve the speed of query sorting

Number:

A primary key can have only one

A table can have more than one foreign key

A table can have multiple unique indexes

The difference between a clustered index and a nonclustered index.

The clustered index must be a unique index. However, a unique index is not necessarily a clustered index.

Clustered indexes, where the data is stored directly in the index page, and the nonclustered index stores the index on the index page, which points to the data for the specialized data page.


The design principle of primary key and foreign key in database

Primary and foreign keys are binders that organize multiple tables into a valid relational database. The design of primary keys and foreign keys has a decisive impact on the performance and availability of physical databases.

The database schema must be converted from theoretical logical design to actual physical design. And the structure of primary key and foreign key is the crux of this design process. Once the designed database is used in the production environment, it is difficult to modify these keys, so it is necessary and worthwhile to design primary and foreign keys at the development stage.

Primary key:

A relational database relies on a primary key---it is the cornerstone of the physical schema of the database. Primary keys are only available on a physical level for two purposes:

1. Uniquely identifies a row.

2. As an object that can be referenced effectively by a foreign key.

Based on these two uses, here are some of the principles I follow when designing a primary key at the physical level:

1. The primary key should be meaningless to the user. If a user sees data in a connection table that represents a many-to-many relationship and complains that it is useless, it proves that its primary key is well designed.

2. Primary keys should be single-column to increase the efficiency of connection and filtering operations.

Note: people who use composite keys usually have two reasons to justify themselves, both of which are wrong. The first is that the primary key should have practical meaning, however, making the primary key meaningful is only convenient for destroying the database artificially. The second is using this method to use two foreign keys as primary keys in a connection table that describes a many-to-many relationship. I also object to the idea that composite primary keys often lead to undesirable foreign keys, i.e. when the join table becomes another from the table's primary table, and the second method above becomes part of the table's primary key, This table may again become the main table from the table, and its primary key may be another part of the primary key from the table, so passed on, the more back from the table, its primary key will contain more columns.

3. Never update the primary key. In fact, because a primary key has no other purpose than to uniquely identify a row, there is no reason to update it. If the primary key needs to be updated, it is a violation of the principle that the primary key should not be meaningful to the user.

Note: This principle is not applicable to data that needs to be collated when data conversion or multiple database merging is often required.

4. Primary keys should not contain dynamically changing data, such as timestamps, creation time columns, modification time columns, and so on.

5. The primary key should be automatically generated by the computer. If a person intervenes to create a primary key, it has a meaning other than a single line of identification. Once you cross this line, you may have the motivation to think about modifying the primary key, so that the key means that the system uses to link record rows and manage records rows falls into the hands of people who do not know the database design.

Four, database primary key selection policy

When we build a database, we need to specify a primary key for each table, which is a property or group of attributes that uniquely identifies a row in a table, one table can have only one primary key, but can have multiple candidate indexes. Because a primary key can uniquely identify a row of records, you can ensure that no confusedly errors occur when you perform data updates and deletions. Of course, other fields can help us eliminate sharing conflicts when we do these things, but that's not what we're talking about here. primary keys, in addition to the above, often form referential integrity constraints with foreign keys to prevent data inconsistency. Therefore, the database in the design, the primary key plays a very important role.

Common database primary Key selection methods are: Auto growth field manual growth field uniqueidentifier "comb (Combine)" type

1 Auto-grow fields

Many database designers prefer to use an automatic growth field because it is simple to use. Auto-grow fields allow us to add data to the database, regardless of the primary key value, after the record is inserted, the database system will automatically assign a value to ensure that there is absolutely no duplication. If you use a SQL Server database, we can also use the @ @IDENTITY global variable to get the primary key value of the system allocation after the record is inserted.

While the automatic growth field saves us a lot of tedious work, there is a potential problem with it, which is that in data buffering mode, it is difficult to fill in the value of the primary key and the foreign key in advance. Suppose there are two tables:

Order (OrderID, OrderDate)
Orderdetial (OrderID, LineNum, ProductID, Price)

OrderID in the Order table are fields that grow automatically. Now we need to enter an order, including inserting a record in the Orders table and inserting several records into the OrderDetail table. Because the OrderID in the Order table is a field that is automatically growing, we cannot know the value of the record before it is formally inserted into the database, and only after the update can we see what value the database assigns to it. This can cause the following contradictions to occur:

First, in order to add the correct value to the OrderDetail OrderID field, you must first update the order table to get the OrderID value that the system assigns to it, and then populate the OrderDetail table with this OrderID. Finally, update the Oderdetail table. However, to ensure data consistency, order and OrderDetail must be updated at the same time under transaction protection, ensuring that both tables are more successful at the same time. Obviously, they are contradictory to each other.

In addition, when we need to replicate data between multiple databases (SQL Server's data distribution, subscription mechanism allows us to replicate data between libraries), auto-growing fields can cause primary key conflicts when data is merged. When you imagine that an order table in a database replicates a database to an OrderID in another library, should it automatically grow?

Ado. NET allows us to set a field to an automatic growth field in the dataset, but remember that the automatic growth field is just a placeholder, and when the database is updated, the value generated by the database automatically replaces the value assigned by Ado.net. So in order to prevent users from misunderstanding, we recommend that you set the auto growth initial value and increment in Ado.net to 1. Furthermore In Ado.net, we can create DataRelation for two tables, when two table updates of a cascading relationship exist, the value of the corresponding key in the other table after one table is updated will also change automatically, which can greatly reduce the problem of automatically growing fields when updating between two tables with cascading relationships.

2 manual growth fields

Now that the automatic growth field can be such a hassle, we might consider using a manually-grown field, which means that the value of the primary key needs to be maintained on its own, usually by creating a separate table to store the current primary key value. In the example above, we create a new table called Intkey, which contains two fields, KeyName and keyvalue. Just like a Hashtable, give a keyname, you can know what the current keyvalue is, and then manually implement key value data increment. In SQL Server, you can write a stored procedure to automate the process of taking a key value. The code is as follows: CREATE PROCEDURE [Getkey]

@KeyName Char (10),
@KeyValue int OUTPUT

As
UPDATE intkey SET @KeyValue = keyvalue = keyvalue + 1 WHERE keyname = @KeyName
Go

In this way, by invoking the stored procedure, we can get the latest key value to ensure that no duplication occurs. If you set the OrderID field to a manual growth field, our program can be implemented by calling the stored procedure first, obtaining a OrderID, and then using this OrderID to populate the order table with the OrderDetail table, and then update the two tables under transaction protection.

When you use a manual growth field as a primary key to replicate data between databases, you can ensure that key value conflicts do not occur during data consolidation, as long as we assign different primary key value segments to different databases. However, the use of manual growth fields increases the roundtrip of the network, and we must obtain the current primary key value by adding a database access, which increases the load on the network and the database, which can be a big disadvantage when in a low or disconnected network environment. At the same time, the manual maintenance of the primary key also consider the concurrency of various factors such as conflict, which will increase the complexity of the system.

3 using uniqueidentifier

SQL Server provides us with a uniqueidentifier data type and provides a build function newid () that uses NEWID () to generate a unique uniqueidentifier. UniqueIdentifier occupies 16 bytes in the database and the probability of duplication is so small that it can be considered to be 0. We often see the same from the registry

{45F0EB02-0727-4F2E-AAB5-E8AEDEE0CEC5}

The thing is actually a uniqueidentifier,windows use it to make COM components and interface identification to prevent duplication. In the. NET tube uniqueidentifier is called a GUID (Global Unique Identifier). In C #, you can use the following command to generate a guid:guid U = System.Guid.NewGuid ();

For the above mentioned order and OrderDetail procedures, if you choose uniqueidentifier as the primary key, we can avoid the above mentioned to increase the network roundtrip problem. Use the program to generate the GUID directly to populate the primary key, regardless of whether there will be duplicates.

The uniqueidentifier field also has a serious flaw: first, it is 16 bytes long, 4 times times the length of an integer, and consumes a lot of storage space. More seriously, uniqueidentifier generation has no regularity, and it is a time-consuming operation to build an index on it (most databases have indexes on the primary key). Someone has done experiments, inserting the same amount of data, using uniqueidentifier data is slower than the use of integer data, so, for efficiency reasons, avoid using uniqueidentifier database as primary key value.

4 Using the "Comb (Combine)" Type

Since the above three kinds of primary key type selection strategy have their own shortcomings, then in the end there is no good way to solve it. The answer is yes. By using the comb type (there is no comb type in the database, which is designed by Jimmy Nilsson in the article "The cost of GUIDs as Primary Keys"), a good balance can be found between the three.

The basic design idea of the comb data type is this: since uniqueidentifier data is inefficient because of the lack of regularity, which affects the performance of the system, can we keep the first 10 bytes of uniqueidentifier by the way of combination, The latter 6 bytes represent the time of the GUID generation (DateTime), so that we combine the time information with the uniqueidentifier to increase the indexing efficiency while preserving the uniqueness of the uniqueidentifier. Some people may worry that uniqueidentifier reduced to 10 bytes will result in duplication of data, in fact, do not worry, the 6-byte time precision can reach 1/300 seconds, two comb type data exactly the same probability is in this 1/ It is almost impossible that the first 10 bytes of the two GUIDs generated in 300 seconds are identical. Using SQL commands in SQL Server to implement this idea is: DECLARE @aGuid uniqueidentifier

SET @aGuid = cast (CAST (NEWID () as BINARY (10))
+ CAST (GETDATE () as BINARY (6)) as uniqueidentifier)

After testing, the use of comb key than the use of int master key, in the search, insert, UPDATE, Delete, and other operations are still slow, but more than the type of unidentifier faster. About the test data can refer to my July 21, 2004 essay.

In addition to using stored procedures to implement comb data, we can also use C # to generate comb data so that all primary key generation work can be done on the client. C # code is as follows://================================================================
<summary>
Return GUID for database operations, specific time code can improve retrieval efficiency
</summary>
<returns>comb (GUID-time mixed) type GUID data </returns>
public static Guid Newcomb ()
{
byte[] Guidarray = System.Guid.NewGuid (). Tobytearray ();
DateTime basedate = new DateTime (1900,1,1);
DateTime now = DateTime.Now;
Get the days and milliseconds which'll be used to build the byte string
TimeSpan days = new TimeSpan (now. Ticks-basedate.ticks);
TimeSpan msecs = new TimeSpan (now. Ticks-(New DateTime (now). Year, now. Month, now. Day). Ticks));

Convert to a byte array
Note This SQL Server is accurate to 1/300th the a millisecond so we divide by 3.333333
byte[] Daysarray = Bitconverter.getbytes (days. Days);
byte[] Msecsarray = bitconverter.getbytes (Long) (msecs. totalmilliseconds/3.333333));

Reverse the bytes to match SQL Servers ordering
Array.reverse (Daysarray);
Array.reverse (Msecsarray);

Copy the bytes into the GUID
Array.copy (Daysarray, Daysarray.length-2, Guidarray, guidarray.length-6, 2);
Array.copy (Msecsarray, msecsarray.length-4, Guidarray, guidarray.length-4, 4);

return new System.Guid (Guidarray);
}

// ================================================================
<summary>
Generate time information from a GUID returned by SQL SERVER
</summary>
<param name= "GUID" > Comb </param> containing time information
<returns> Time </returns>
public static DateTime Getdatefromcomb (System.Guid Guid)
{
DateTime basedate = new DateTime (1900,1,1);

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.