The goal of this article is to identify an anti-pattern that uses a primary key, but confuses the nature of the primary key.
first, establish the primary key specification
Everyone who knows the design of the database knows that the primary key is a very important and even necessary part of a table. It is true that the primary key is part of a good database design. The primary key is the guarantee that the database ensures that the data rows are unique to the entire table . It is a logical mechanism to navigate to a record and ensure that storage is not duplicated. A primary key can also be referenced by a foreign key to establish a relationship between tables.
The difficulty is selecting that column as the primary key. The value of each property in most tables can be used by many rows. For example, name, email address, etc. are not guaranteed to be duplicated.
In such a table, you need to introduce a new column that is meaningless for the domain model of the table to store a pseudo-value. This column is used as the primary key for this table to determine one record in the table, even if the other columns allow the appropriate duplicates to occur. This type of primary key column is commonly referred to as a pseudo-primary key or surrogate key .
Most databases provide an underlying scenario that is unrelated to the current transaction to ensure that a globally unique integer is generated every time as a pseudo-primary key, even if the client is initiating concurrent operations at this time.
The role of the primary key exists:
1. Ensure that the data in one table does not duplicate rows.
2. Reference a single row of records in the query.
3, support foreign key.
second, anti-mode: Status Quo
Many books, articles, and procedural frameworks will tell you that each database table requires a primary key and has the following three features:
(1), the column name of the primary key is called the ID;
(2), the data type is a 32-bit or 64-bit integer
(3), the value of the primary key is automatically generated to ensure unique.
There is a column called ID in each table that is so common that even the ID has become a synonym for the primary key. Many programmers were taught the wrong idea when they first learned SQL. It's obviously too random to think that each table has to add a column ID.
1. Redundant key value
2. Allow duplicates
A key combination contains several different columns, and the typical scenario for a key combination is to try to cross the Contact table in the street in the previous section. The primary key needs to ensure that a given combination of product_id and account_id can occur only once in the entire table, although the same value may appear in many different pairings.
However, when you use the ID column as the primary key, the constraint is not a combination of account_id and product_id must be unique. When you use this crosstab to query the relationship between account_id and product_id, duplicates can be unexpected results. To make sure there are no duplicates, you can declare additional two columns in addition to the ID to require a unique constraint. But when you apply a uniqueness constraint on both account_id and product_id columns, the ID column becomes an extra column. It has deviated from the primary key's original intention.
3, the meaning of the keyword is not clear
The word ID is so ordinary that it is completely incapable of expressing a deeper meaning, especially when you are making two table connection queries, and they all have a primary key called ID.
Select * from as a Join on = a.id)
This query must specify a column alias at query time, otherwise one of the ID columns overrides the value of the ID of the other column.
It is also not possible to use the Using keyword when there are two tables with the same ID column.
For example, SQL supports a concise expression that represents a two-table join (using). If two tables have the same column name, you can rewrite the above requirements with the following expression.
Select * from Join Bug using (bug_id); -- a primary key, a foreign key.
However, if all tables require defining a pseudo-key called ID, then this shorthand will not be used.
4. Using combination keys
Some developers feel that the key combinations are too difficult to use, and if you want to compare two key values, you must compare the values of all the columns they contain, and a foreign key that references a combination of keys, which itself must also be a composite foreign key. In addition, the use of key combinations requires more words to be played. In fact, this is not right. The combination of keys should be used when appropriate.
iii. identification of anti-patterns
1, I think this table does not need the primary key.
So the people must have misunderstood the meaning of "primary key" and "pseudo-primary key", each table must have a primary key, this is no doubt. In fact, maybe this person needs a key combination, or a more natural column name for the key.
2. How can I store duplicate items in many-to-many tables?
A PRIMARY KEY constraint needs to be declared in a cross-table to a multi-pair relationship. Or at least a single constraint that is referenced as a foreign key is required.
iv. Rational use of hair patterns
There is no error in the mechanism of using a pseudo-primary key, or through an auto-growing integer, but not every table requires a pseudo-primary key, and it is not necessary to define each pseudo-primary key as an ID.
Pseudo-primary keys are a good substitute for natural keys that are too long and inconvenient to implement. For example, in a table that records all the file attributes of a file system, the file path is a good natural key, but the cost of indexing a string column can be significant.
v. Solutions: Cutting Design
The primary key is a constraint, not a data type. You can define any column or any number of columns as the primary key. As long as its data type supports indexing. It is also important to add that in SQL Server, the primary key and the clustered index are not necessarily related. SQL Server simply builds the clustered index on the primary key by default, and you can actually manually define the clustered index to a non-primary key column.
1. Select a more meaningful name for the primary key
For example, the primary key of the table for product is called product_id.
2. The foreign key should use the same name as the column referenced, which usually means that the name of a primary key should be unique throughout the design of the database; no two tables should use the same name to define a primary key, unless one of them references another as a foreign key. However, there are exceptions, and the name of the foreign key needs to be separated from the primary key that it refers to, so that the referential relationship between them is more clearly expressed.
For example, in a foreign key table, the foreign key is declared as Create_by (by whom).
Rules |
Natural keys |
Surrogate key |
The primary key must uniquely identify each record |
But related to input and human error |
System self-generated data is unique |
The primary key for a record cannot be empty |
Records can only be entered when the data is known |
Established by the system when the record is generated |
When a record is generated, the value of the primary key must exist |
Records can only be entered when the data is known |
The system is established when the record is generated and is established by the system when the record is generated. |
Primary key must remain stable-you cannot change the primary key's domain |
Natural keys are related to some business rules and other external influences |
Agent keys are neutral to program functions and data |
The primary key must be concise and do not contain excessive attributes |
A natural key can contain multiple domains |
Surrogate keys can only contain multiple domains |
The value of the primary key cannot be changed |
Natural keys usually change |
Surrogate keys do not usually change
|
six, embrace the natural keys and key combinations
If your table contains a column that ensures that it is unique, non-empty, and can be used to locate a record, don't feel the need to add a pseudo-primary key just because of the tradition.
Practice has shown that it is normal for each column in a table to encounter changes after the initial design. The design of the database tends to be constantly tuned and optimized throughout the project's declaration cycle, and policymakers may not care about the divine inviolability of natural keys at all. Sometimes a column starts out as a good natural primary key, but then there is a legal repetition that is allowed. At this point, the pseudo-primary key becomes the only choice.
A federated primary key can also be used at the right time, such as a record that can be fully positioned through a combination of multiple columns. Like the Contact table mentioned above, create a federated primary key from those columns.
Summary: My personal opinion is that I can use the surrogate key as far as possible with the surrogate key. Unless the surrogate key is really redundant, it's like an example of the composite key instead of the compound key.
Logical database design-Requires ID (Shing ID)