A Free Trial That Lets You Build Big!
Start building with 50+ products and up to 12 months usage for Elastic Compute Service
1. What are primary keys and Foreign keys:
A record in a relational database has several attributes. If one of the attribute groups (note that it is a group) uniquely identifies a record, the attribute group can become a primary key, for example:
Student table (student ID, name, gender, class)
The student ID is unique and the student ID is a primary key.
User table (username, password, and logon level)
The user name is unique, and the user name is a primary key.
Computer record table (card number, student number, name, serial number)
A single attribute in the computer record table cannot uniquely identify a record. The combination of student ID and name can uniquely identify a record. Therefore, the attribute group of student ID and name is a primary key.
The serial number in the computer record table is not the primary key of the student table, but it corresponds to the student ID in the student table, and the student ID in the student table is the primary key of the student table.
The primary key and foreign key are defined to maintain the integrity of the relational database. The following is a summary:
A primary key is a unique identifier that can be used to determine a record. For example, a record contains an ID card number, name, and age. The ID card number is the only one that can identify you, and other IDS can be duplicated. Therefore, the ID card number is the primary key.
The foreign key is used to associate with another table. Is a field that can be determined for another table record, used to maintain data consistency. For example, if A field in Table A is the primary key of Table B, it can be the foreign key of Table.
Ii. Differences between primary keys, foreign keys, and indexes
What are the differences between primary keys, foreign keys, and indexes?
Definition: it uniquely identifies a record and cannot have duplicates. A foreign key of an empty table is not allowed to be the primary key of another table. A foreign key can have duplicates, it can be a null value. This field does not have repeated values, but there can be a null value.
Role: Used to ensure data integrity and establish connections with other tables to improve query sorting speed
Number of Primary keys: Only one table can have multiple foreign keys. One table can have multiple unique indexes.
What is the difference between clustered index and non-clustered index?
Clustered indexes must be unique indexes. However, the unique index is not necessarily a clustered index.
Clustered index stores data directly on the index page, instead of clustered index stores the index on the index page, which points to the data on a dedicated data page.
Iii. Primary Key and foreign key design principles in the database
The primary key and foreign key are the adhesives that organize multiple tables into a valid relational database. The primary key and foreign key design have a decisive impact on the performance and availability of the physical database.
The database mode must be converted from the logical design to the physical design. The structure of the primary key and the foreign key is the crux of this design process. Once the database is used in the production environment, it is difficult to modify these keys. Therefore, it is necessary and worthwhile to design the primary keys and Foreign keys during the development stage.
Relational databases depend on primary keys. It is the cornerstone of the physical database mode. The primary key has only two purposes on the physical layer:
1. uniquely identifies a row.
2. As an object that can be effectively referenced by a foreign key.
Based on the above two purposes, the following provides some principles that I follow when designing the primary keys of the physical layer:
1. Primary keys should be meaningless to users. If you see data in a connection table that represents many-to-many relationships and complain about its usefulness, it proves that its primary key is well designed.
2. Primary keys should be single columns to improve the efficiency of connection and filtering operations.
Note: people who use composite keys usually have two reasons for self-release, which are both incorrect. The first is that the primary key should have practical significance. However, making the primary key meaningful only makes it easy to artificially damage the database. The second is that two external keys can be used as the primary key in the connection table describing multiple-to-multiple relationships. I also oppose this approach because: composite primary keys often lead to bad Foreign keys. That is, when the connected table becomes another master table of the slave table, it becomes part of the primary key of the table based on the second method above, however, this table may become another master table of another slave table, and its primary key may be a part of another primary key of the slave table. The more backward the slave table, the primary key will contain more columns.
3. Never update the primary key. In fact, because the primary key only identifies a row and has no other purposes, there is no reason to update it. If the primary key needs to be updated, it indicates that the primary key should be meaningless to the user.
Note: This principle is not applicable to data that often needs to be sorted during data conversion or multi-database merger.
4. The primary key should not contain dynamically changed data, such as the timestamp, Creation Time column, and modification time column.
5. The primary key should be automatically generated by a computer. If a person intervene in the creation of a primary key, it will have a meaning other than a unique row. Once this boundary is crossed, the motive for modifying the primary key may be generated, this system is used to link record rows and manage record rows. It will fall into the hands of people who do not know the database design.
Iv. Database primary key selection policy
When creating a database, we need to specify a primary key for each table. The primary key uniquely identifies the attribute or attribute group of a row in the table. A table can only have one primary key, however, there can be multiple candidate indexes. Because the primary key can uniquely identify a row of records, it can ensure that no error is reported during data update and deletion. Of course, other fields can help us eliminate sharing conflicts when performing these operations, but we will not discuss them here. In addition to the above functions, the primary key often forms a reference Integrity Constraint with the foreign key to prevent data inconsistency. Therefore, the primary key plays an important role in the database design.
Common primary key selection methods for databases include:
• Automatically increasing Fields
• Manually add fields
• "COMB (Combine)" Type
1. Auto-increment Field
Many database designers prefer to use auto-increment fields because they are easy to use. The auto-increment field allows us to ignore the value of the primary key when adding data to the database. After the record is inserted, the database system automatically assigns a value to it to ensure that there are no duplicates. If you use the SQL Server database, you can use the @ Identity global variable to obtain the assigned primary key value after record insertion.
Although the automatic growth field saves us a lot of tedious work, it also has potential problems, that is, in the data buffer mode, it is difficult to enter the primary key and foreign key values in advance.
Assume there are two tables:
Order (OrderID, OrderDate)
OrderDetial (OrderID, LineNum, ProductID, Price)
The OrderID in the Order table is an Automatically increasing field. Now we need to enter an Order, including inserting a record in the Order table and inserting several records in the OrderDetail table. Because the OrderID in the Order table is an Automatically increasing field, we cannot know its value before the record is formally inserted into the database, you can only know the value assigned by the database after the update. This will cause the following conflicts:
First, in Order to add the correct value to the OrderID field of OrderDetail, you must first update the Order table to obtain the OrderID value allocated to it by the system, and then fill the OrderDetail table with this OrderID. Finally, update the OderDetail table. However, to ensure data consistency, Order and OrderDetail must be performed simultaneously under transaction protection when updating, that is, to ensure that both tables are synchronized. Obviously, they are in conflict.
In addition, when we need to replicate data between multiple databases (the data distribution and subscription mechanism of SQL Server allows us to replicate data between databases ), automatic growth fields may cause primary key conflicts during data merging. Suppose that the Order table in one database should not automatically grow when copying the database to the Order table in another database?
ADO. NET allows us to set a field in DataSet as an auto-increment field, but remember that this auto-increment field is only a placeholder. When the database is updated, the value generated by the database automatically replaces ADO.. NET Value. To prevent misunderstandings, we recommend that you set the auto-increment initial values and increments in ADO. NET to-1. In addition, in ADO. NET, we can create DataRelation for the two tables, so that two tables with cascading relationships are updated, after a table is updated, the corresponding key value of the other table also changes automatically, which greatly reduces the trouble of Automatically increasing fields when two tables with cascading relationships are updated.
2 manual growth Fields
Since automatic growth fields can cause such trouble, we may consider using manual growth fields, that is, the value of the primary key needs to be maintained by ourselves, generally, you need to create a separate table to store the current primary key value. In addition, the preceding example shows that a new table named IntKey contains two fields: KeyName and KeyValue. Just like a HashTable, you can give a KeyName to know what the current KeyValue is, and then manually increase the key value data. You can write such a stored procedure in SQL Server to enable the process of getting the key value automatically. The Code is as follows:
Create procedure [GetKey]
@ KeyNamechar (10 ),
@ KeyValue exactly utput as update IntKey SET @ KeyValue = KeyValue + 1 WHERE KeyName = @ KeyName GO
In this way, by calling the stored procedure, we can obtain the latest key value to ensure that no duplicate occurs. If you set the OrderID field to a manual growth field, our program can be implemented in the following steps: first, call the stored procedure to obtain an OrderID, and then use this OrderID to fill the Order table and OrderDetail table, at last, the two tables are updated under transaction protection.
When using a manual growth field as the primary key for data replication between databases, you can ensure that key-value conflicts do not occur during data merging, you only need to assign different primary key value segments to different databases. However, using the manual growth field will increase the network's RoundTrip. We must add a database access to obtain the current primary key value, which will increase the load on the network and database, in a low-speed or disconnected network environment, this approach has major drawbacks. At the same time, manual maintenance of the primary key should also consider concurrency conflicts and other factors, which will increase the complexity of the system.
3. Use UniqueIdentifier
SQL Server provides the UniqueIdentifier data type for us, and provides a function to generate NEWID (). You can use NEWID () to generate a unique UniqueIdentifier. UniqueIdentifier occupies 16 bytes in the database, and the probability of occurrence of repetition is very small, so it can be considered as 0. We often see a similar
Is actually a UniqueIdentifier, which is used by Windows as the COM component and interface identification to prevent duplication. In. NET, UniqueIdentifier is called GUID (Global Unique Identifier ). In C #, you can use the following command to generate a GUID:
Guid u = System. Guid. NewGuid ();
For the Order and OrderDetail programs mentioned above, if UniqueIdentifier is used as the primary key, we can avoid the problem of adding network RoundTrip mentioned above. The program directly generates a GUID to fill in the primary key, so you do not need to consider whether duplicate occurs.
The UniqueIdentifier field also has a serious defect: first, it is 16 bytes long and 4 times the length of an integer, which occupies a large amount of storage space. More seriously, the generation of UniqueIdentifier is irregular. It is very time-consuming to create an index on it (most databases have an index on the primary key. Some people have done experiments to insert the same data volume. It is slower to use UniqueIdentifier data as the primary key than to use Integer data. Therefore, for efficiency reasons, try to avoid using UniqueIdentifier database as the primary key value.
4. Use the "COMB (Combine)" Type
Since all the above three primary key type selection policies have their own shortcomings, is there a good solution? The answer is yes. By using The COMB type (there is no COMB type in The database, it was designed by Jimmy Nilsson in his article "The Cost of GUIDs asPrimary Keys ), A good balance can be found between the three.
The basic design of the COMB data type is as follows: since the UniqueIdentifier data causes low indexing efficiency and affects the system performance, can we use a combination, keep the first 10 bytes of UniqueIdentifier, and use the last 6 bytes to indicate the time (DateTime) generated by the GUID. In this way, we combine the time information with UniqueIdentifier, while retaining the uniqueness of UniqueIdentifier, order is added to Improve Index efficiency. Some may worry that the reduction of UniqueIdentifier to 10 bytes may result in data duplication. In fact, don't worry. The time precision of the last 6 bytes can reach 1/300 seconds, the possibility that two COMB data types are identical is that the first 10 bytes of the two guids generated in these 1/300 seconds are identical, which is almost impossible! In SQL Server, you can use SQL commands to implement this idea:
DECLARE @ aGuidUNIQUEIDENTIFIER
SET @ aGuid = CAST (NEWID () as binary (10 ))
+ CAST (GETDATE () as binary (6) as uniqueidentifier)
After testing, using COMB as the primary key is slower than using INT as the primary key, but faster than the Unidentifier type.
The above is a brief introduction to the primary key and foreign key in the SQL database. If there is any discrepancy, please forgive me!
Start building with 50+ products and up to 12 months usage for Elastic Compute Service