Selection of the unique identifier (ID) of the database

Last Update:2015-01-03 Source: Internet

Author: User

Tags uuid

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

background : Several years of work, has designed a lot of systems or products database, there is a single machine, LAN environment, there is an Internet environment, for different environments, design considerations are different. Even for the same environment, there are different designs depending on the business or the amount of data. A database of Internet products (MySQL service) is also being designed in the near future. After the accumulation, in the table ID design this link carried out a lot of analysis, comparison, learning, the design of the ID has a more systematic and profound cognition, the knowledge of their own learning practice to summarize down, share to everyone.

Selection of primary Key ID

For relational databases, the first step in designing each table will determine its primary key, which is the ID. In common sense, the self-increment ID of type int, the UUID of the string type, other unique business-related keys ... Are the choices we have as primary keys. So does it mean that in a table, a single attribute column can be used as the primary key or a more appropriate key?

Let's first clear a few concepts:

logical primary KEY (proxy primary key) : Use a field in a database table that is independent of the business logic information in the current table as its primary key, or" Pseudo-primary key ";
Business primary KEY (natural primary key) : A field with business logic meaning in the database table as the primary key;

A very common example: a user Information table, column attributes have ID, user name, mobile phone number ..., where the user name and mobile number (as a login account both are unique). Where the ID is used as the logical primary key, both the user name and the phone number can be used as the business primary key. Then I can choose a random, even I chose the business primary key can not be a logical primary key?

So let's start by looking at the strong differences between the logical primary key and the business primary key:

support logical primary key

The table has a primary key to guarantee uniqueness for each record, and the table's primary key should have no business meaning, because any column with business meaning has the potential to change. One of the most important theories of relational database science is: don't give any business meaning to keywords. If the keyword has business meaning, when the user decides to change the business meaning, maybe they want to add a few numbers to the keyword or change the number to a letter, then they have to modify the relevant keywords. The primary key in one table is likely to be a foreign key by another table. Even a simple change, such as adding a number to a customer number, can cause significant maintenance overhead.

In order for the primary key of a table to have no business meaning, one workaround is to use a surrogate primary key, such as defining an ID field (or other name) that does not have any business meaning for the table, specifically as the primary key for the table.
--Sun Weichen "proficient Hibernate:java object Persistence technology detailed" P8
The main reason for using logical primary keys is that changes to the part of the system that are associated with that primary key will be unavoidable if the business primary key is changed. , and the more you reference the larger the change. Using the logical primary key only needs to modify the business logic related to the business primary key, which reduces the scope of the influence on the system due to the related changes of the business primary key. The change of business logic is unavoidable, because "forever is the change", no one company is invariable, no one business is forever unchanged. The most typical example is a change in the business of the * * * and Driver's license number. And in reality, it does appear that the number repeats the case, so that if you use the * * * Number as the primary key also brought difficult to deal with the situation. Of course, there are a lot of solutions to change, one of which is to make a new system with the times, which is really good for software companies.

Another reason for using logical primary keys is that the business primary key is too large for transmission, processing, and storage. I think generally if the business primary key more than 8 bytes should be considered to use the logical primary key, because the int is 4 bytes, bigint is 8 bytes, and the business primary key is generally a string, the same is 8 bytes of bigint and 8 bytes of string in the transmission and processing is naturally bigint more efficient. Imagine the difference between the ID of "12345678" and the assembler code with ID 12345678. Of course, the logical primary key is not necessarily an int or bigint, and the business primary key is not necessarily a string or an int or a datetime type, while the transmission is not necessarily the primary key, this is the specific analysis, but the principle is similar, here is just the general situation. Also, if other tables need to reference the primary key, the primary key needs to be stored, and the cost of the storage space is not the same. And this reference field of these tables is usually a foreign key, or usually will be indexed to facilitate the search, which will also result in the cost of storage space is different, this also needs to be specifically analyzed.

Another reason for using logical primary keys is that using an int or bigint as a foreign key for a join query can perform faster than joining a query with a string as a foreign key. The principle is similar to the above, which is not repeated here.

Another reason for using a logical primary key is that there is a problem with a user or maintenance person mistakenly entering data into the business primary key. For example, the wrong to input the RMB into the RXB, the relevant reference is quoted the wrong data, once the need to modify is very troublesome. If the use of logical primary keys is a good solution, if the use of business primary key will affect the foreign key data of other tables, of course, can also be resolved by cascading updates, but not all of the energy level associated. Summary of
support business primary key

If your table contains a column that ensures that it is unique, non-empty, and can be used to locate a record, don't feel the need to add a pseudo-primary key just because of the tradition.
--bill karwin "SQL anti-pattern" p41
The main reason for using a business primary key is that adding a logical primary key is adding a business-independent field, and the user is usually looking for business-related fields (such as the employee's work number, the book's ISBN No.), so that we must index the business fields in addition to the logical primary key. This will degrade the performance of the database and increase the overhead of storage space. Therefore, the use of business primary keys is a better choice for basic data that does not change very often in business. On the other hand, for the basic data, the general increase, deletion, change are relatively small, so this part of the cost is not too much, and if this time for business logic changes have concerns, but also can consider the use of logical primary key, which requires specific analysis of the problem.

Another reason to use a business primary key is that, for user actions, it is done through a business field, so in these cases, if you use a logical primary key, you must do more mapping transformations. I think this kind of worry is superfluous, directly using the Business primary key query can get results, do not have to control the logical primary key, unless the business primary key itself is not unique. In addition, if the design of the use of logical primary key, when the code will be the primary key processing, in the system internal transmission, processing and storage are the same primary key, there is no conversion problem. Unless the existing system uses a business primary key, there is a conversion problem in order to change the existing system to use a logical primary key. For the time being, there is no such thing as a change of scene.

One more reason to use business primary keys is that security is more important to the banking system than performance, when you consider using a business primary key that can be used as a primary key or as redundant data to avoid the associated loss problems that are caused by using logical primary keys. If for some reason the relationship between the primary table and the child table is lost, the bank will face irreparable losses. To prevent this from happening, the business primary key needs to be redundant in important tables, and the best way to handle this is to use the business primary key directly. such as * * * number, passbook number, card number and so on. So usually the banking system requires the use of a business primary key, which is not due to performance considerations but for security reasons.
Summary of--switchblade

So the choice of the logical primary key and the business primary key is not the result of the head, but the result of the different application scenario and different requirement decision.

What if we use the integer type's self-increment ID as the primary key?
This self-increasing primary key can be a hindrance to the need for a horizontal sub-table when the data volume is very large. (In fact, there will be a solution to this situation, please see the article "The design of sub-library in the netting architecture"

Selection of the ID data type

Let's take another look at the primary key selection: data type.

integer type :
The integer type is often the best choice for the ID column because it is the most efficient and can use the self-increment primary key of the database.
String type
String types are definitely more space-intensive than integer types, and are slower than integer type operations. My main use is MySQL, the explanation of this topic is suggested to see "high-Performance MySQL" third edition P125.

I used the scheme (MySQL): Using the self-increment ID as the primary key to deal with the insertion efficiency problem, the use of UUID as a logical ID, has many advantages of the logical primary key, and can be used to deal with the subsequent level of the table.

Selection of the unique identifier (ID) of the database

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More