Logical database design-Requires ID (Shing ID)

Last Update:2015-07-17 Source: Internet

Author: User

Tags one table

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The goal of this article is to identify an anti-pattern that uses a primary key, but confuses the nature of the primary key.

first, establish the primary key specification

Everyone who knows the design of the database knows that the primary key is a very important and even necessary part of a table. It is true that the primary key is part of a good database design. The primary key is the guarantee that the database ensures that the data rows are unique to the entire table . It is a logical mechanism to navigate to a record and ensure that storage is not duplicated. A primary key can also be referenced by a foreign key to establish a relationship between tables.

The difficulty is selecting that column as the primary key. The value of each property in most tables can be used by many rows. For example, name, email address, etc. are not guaranteed to be duplicated.

In such a table, you need to introduce a new column that is meaningless for the domain model of the table to store a pseudo-value. This column is used as the primary key for this table to determine one record in the table, even if the other columns allow the appropriate duplicates to occur. This type of primary key column is commonly referred to as a pseudo-primary key or surrogate key .

Most databases provide an underlying scenario that is unrelated to the current transaction to ensure that a globally unique integer is generated every time as a pseudo-primary key, even if the client is initiating concurrent operations at this time.

The role of the primary key exists:

1. Ensure that the data in one table does not duplicate rows.

2. Reference a single row of records in the query.

3, support foreign key.

second, anti-mode: Status Quo

Many books, articles, and procedural frameworks will tell you that each database table requires a primary key and has the following three features:

(1), the column name of the primary key is called the ID;

(2), the data type is a 32-bit or 64-bit integer

(3), the value of the primary key is automatically generated to ensure unique.

There is a column called ID in each table that is so common that even the ID has become a synonym for the primary key. Many programmers were taught the wrong idea when they first learned SQL. It's obviously too random to think that each table has to add a column ID.

1. Redundant key value

2. Allow duplicates

A key combination contains several different columns, and the typical scenario for a key combination is to try to cross the Contact table in the street in the previous section. The primary key needs to ensure that a given combination of product_id and account_id can occur only once in the entire table, although the same value may appear in many different pairings.

However, when you use the ID column as the primary key, the constraint is not a combination of account_id and product_id must be unique. When you use this crosstab to query the relationship between account_id and product_id, duplicates can be unexpected results. To make sure there are no duplicates, you can declare additional two columns in addition to the ID to require a unique constraint. But when you apply a uniqueness constraint on both account_id and product_id columns, the ID column becomes an extra column. It has deviated from the primary key's original intention.

3, the meaning of the keyword is not clear

The word ID is so ordinary that it is completely incapable of expressing a deeper meaning, especially when you are making two table connection queries, and they all have a primary key called ID.

Select *  from  as a Join  on = a.id)

This query must specify a column alias at query time, otherwise one of the ID columns overrides the value of the ID of the other column.

It is also not possible to use the Using keyword when there are two tables with the same ID column.

For example, SQL supports a concise expression that represents a two-table join (using). If two tables have the same column name, you can rewrite the above requirements with the following expression.

Select *  from Join　　Bug using (bug_id); -- a primary key, a foreign key.

However, if all tables require defining a pseudo-key called ID, then this shorthand will not be used.

4. Using combination keys

Some developers feel that the key combinations are too difficult to use, and if you want to compare two key values, you must compare the values of all the columns they contain, and a foreign key that references a combination of keys, which itself must also be a composite foreign key. In addition, the use of key combinations requires more words to be played. In fact, this is not right. The combination of keys should be used when appropriate.

iii. identification of anti-patterns

1, I think this table does not need the primary key.

So the people must have misunderstood the meaning of "primary key" and "pseudo-primary key", each table must have a primary key, this is no doubt. In fact, maybe this person needs a key combination, or a more natural column name for the key.

2. How can I store duplicate items in many-to-many tables?

A PRIMARY KEY constraint needs to be declared in a cross-table to a multi-pair relationship. Or at least a single constraint that is referenced as a foreign key is required.

iv. Rational use of hair patterns

There is no error in the mechanism of using a pseudo-primary key, or through an auto-growing integer, but not every table requires a pseudo-primary key, and it is not necessary to define each pseudo-primary key as an ID.

Pseudo-primary keys are a good substitute for natural keys that are too long and inconvenient to implement. For example, in a table that records all the file attributes of a file system, the file path is a good natural key, but the cost of indexing a string column can be significant.

v. Solutions: Cutting Design

The primary key is a constraint, not a data type. You can define any column or any number of columns as the primary key. As long as its data type supports indexing. It is also important to add that in SQL Server, the primary key and the clustered index are not necessarily related. SQL Server simply builds the clustered index on the primary key by default, and you can actually manually define the clustered index to a non-primary key column.

1. Select a more meaningful name for the primary key

For example, the primary key of the table for product is called product_id.

2. The foreign key should use the same name as the column referenced, which usually means that the name of a primary key should be unique throughout the design of the database; no two tables should use the same name to define a primary key, unless one of them references another as a foreign key. However, there are exceptions, and the name of the foreign key needs to be separated from the primary key that it refers to, so that the referential relationship between them is more clearly expressed.

For example, in a foreign key table, the foreign key is declared as Create_by (by whom).

Rules	Natural keys	Surrogate key
The primary key must uniquely identify each record	But related to input and human error	System self-generated data is unique
The primary key for a record cannot be empty	Records can only be entered when the data is known	Established by the system when the record is generated
When a record is generated, the value of the primary key must exist	Records can only be entered when the data is known	The system is established when the record is generated and is established by the system when the record is generated.
Primary key must remain stable-you cannot change the primary key's domain	Natural keys are related to some business rules and other external influences	Agent keys are neutral to program functions and data
The primary key must be concise and do not contain excessive attributes	A natural key can contain multiple domains	Surrogate keys can only contain multiple domains
The value of the primary key cannot be changed	Natural keys usually change	Surrogate keys do not usually change

six, embrace the natural keys and key combinations

If your table contains a column that ensures that it is unique, non-empty, and can be used to locate a record, don't feel the need to add a pseudo-primary key just because of the tradition.

Practice has shown that it is normal for each column in a table to encounter changes after the initial design. The design of the database tends to be constantly tuned and optimized throughout the project's declaration cycle, and policymakers may not care about the divine inviolability of natural keys at all. Sometimes a column starts out as a good natural primary key, but then there is a legal repetition that is allowed. At this point, the pseudo-primary key becomes the only choice.

A federated primary key can also be used at the right time, such as a record that can be fully positioned through a combination of multiple columns. Like the Contact table mentioned above, create a federated primary key from those columns.

Summary: My personal opinion is that I can use the surrogate key as far as possible with the surrogate key. Unless the surrogate key is really redundant, it's like an example of the composite key instead of the compound key.

Logical database design-Requires ID (Shing ID)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More