The unique identifier (ID) of the database.

Source: Internet
Author: User
Tags database sharding

The unique identifier (ID) of the database.

Background: In years of work, we have designed many system or product databases, including single-host databases, lan databases, and Internet databases. For different environments, the design considerations are different. Even in the same environment, there will be different designs for different services or data volumes. Recently, we have to design an Internet product database (MySQL service ). After the accumulation, a lot of analysis, comparison, and learning are carried out in the Table ID design stage, and the design of ID is more systematic and profound, summarize and share your learned and practical knowledge with you.

Primary Key id Selection

For relational databases, the primary key of each table is determined in the first step. The primary key is ID. In "Common Sense", int-Type Auto-incremental id, string-type uuid, and other unique business-related keys... They are all our primary key options. Does it mean that in a table, as long as the attribute columns with unique values can be used as the primary key or a more suitable primary key?

First, let's clarify several concepts:

  • Logical primary key (proxy primary key): In the database table, a field irrelevant to the business logic information in the current table is used as its primary key, or a "pseudo primary key ";

  • Business primary key (natural primary key): fields with business logic meanings are used as primary keys in database tables;

A common example is a user information table with attributes such as id, user name, and mobile phone number ..., The username and mobile phone number are unique as logon accounts ). The id can be used as the logical primary key, and the user name and mobile phone number can both be used as the business primary key. Can I select either of them, or even do I have selected a business primary key without a logical primary key?

First, let's take a look at the strong disagreement between the logical primary key and the Business primary key:

  • Logical primary keys supported

    The table uses the primary key to ensure the uniqueness of each record. The primary key of the table should not have any business meaning, because any columns with business meanings may change. The most important theory in relational database science is: Do not assign any business significance to keywords. If a keyword has a business significance, when users decide to change the business meaning, maybe they want to add a few digits for the keyword or change the number to a letter, then they must modify the relevant keywords. The primary keywords in a table may be used as foreign keys by other tables. Even a simple change, such as adding a number to a customer number, may cause significant maintenance overhead.

    To prevent the table's primary key from having any business meaning, one solution is to use the proxy primary key, for example, you can define an ID field (or another name) that does not have any business meaning for a table and use it as the primary key of the table.
    -- Sun weiqin "proficient in Hibernate: detailed explanation of Java object persistence technology" P8



  • The main reason for using a logical primary key is that, once the business primary key changes, modifications to the part associated with the primary key in the system will be inevitable, and the more you reference, the larger the changes. To use a logical primary key, you only need to modify the business logic related to the corresponding business primary key, reducing the impact scope of the changes due to the business primary key. Changes in business logic are inevitable, because "what remains the same forever", no company is the same, and no business is the same forever. The most typical example is the business change of the *** upgrading and driver's license number. In addition, there are indeed ** duplicate numbers in reality. In this case, if you use ** numbers as the primary key, it will be difficult to handle them. Of course, there can be many solutions to cope with changes. One solution is to make a new system keep pace with the times, which is indeed a good thing for software companies.

    Another reason for using a logical primary key is that the business primary key is too large to facilitate transmission, processing, and storage. I think the logical primary key should be considered if the business primary key exceeds 8 bytes, because int Is 4 bytes and bigint is 8 bytes, while the business primary key is generally a string, likewise, 8-byte bigint and 8-byte strings are naturally bigint more efficient in transmission and processing. Imagine the differences between the Assembly codes with id "12345678" and id 12345678. Of course, the logical primary key is not necessarily int or bigint, and the Business primary key is not necessarily a string or int or datetime type. At the same time, the transmitted data is not necessarily a primary key, this will be analyzed in detail, but the principle is similar. Here we only discuss the general situation. If other tables need to reference this primary key, they also need to store this primary key, so the overhead of this bucket is also different. In addition, the reference field of these tables is usually a foreign key, or the index is usually used to facilitate search. This will also lead to different storage space overhead, which also requires specific analysis.

    Another reason for using the logical primary key is that using int or bigint as the foreign key for join query is faster than using string as the foreign key for join query. The principle is similar to the above.

    Another reason for using the logical primary key is that users or maintenance personnel mistakenly input data to the business primary key. For example, if RMB is entered as RXB by mistake, all references reference the wrong data. It is very troublesome to modify the data. If the logical primary key is used, the problem is well solved. If the business primary key is used, the foreign key data of other tables will be affected. Of course, the problem can also be solved through cascading update, but not all of them can be cascaded.
    -- Summary of SwitchBlade

  • Service primary keys supported

    If your table contains a column that can be unique, non-null, and used to locate a record, do not add a pseudo primary key just because it is traditionally necessary.
    -- Bill Karwin SQL anti-pattern p41



  • The main reason for using the business primary key is that adding a logical primary key adds a business-independent field, and users usually search for business-related fields (such as employee ID, ISBN No.) of books .), in this way, in addition to adding an index to the logical primary key, we must also add an index to these business fields, so that the database performance will decrease and the storage space overhead will be increased. Therefore, for basic data that does not often change in the business, using the business primary key is a good choice. On the other hand, for basic data, the addition, deletion, and modification are usually relatively small, so the overhead of this part will not be too much. If you are worried about changing the business logic at this time, you can also consider using the logical primary key, which requires a detailed analysis of the problem.

    Another reason for using the business primary key is that user operations are performed through the business field. Therefore, if the logical primary key is used, you must perform another ing and conversion operation. I think this kind of worry is redundant. You can directly query the business primary key to get the result. You don't need to worry about the logical primary key unless the business primary key is not unique. In addition, if the logical primary key is used during design, the primary key will be used as the primary key during encoding. The same primary key is transmitted, processed, and stored in the system, there is no conversion problem. Unless the existing system uses the business primary key, you must change the existing system to the logic primary key. At the moment, I have not thought of any other scenarios such as conversion.

    Another reason for using the business primary key is that the security of the banking system is more important than the performance. In this case, we will consider using the business primary key, which can be both a primary key and redundant data, avoid Association loss caused by the use of logical primary keys. If the association between the primary table and the sub-table is lost for some reason, the bank will face irreparable losses. To prevent this situation, the Business primary key must be redundant in important tables. In this case, the best solution is to directly use the business primary key. For example, ***, passbook, and card number. Therefore, the banking system usually requires the use of the business primary key. This requirement is not for the sake of performance, but for the sake of security.
    -- Summary of SwitchBlade

Therefore, the selection of the logical primary key and the Business primary key is not the result of making decisions based on different application scenarios and requirements.

What problems do we face if we use auto-increment IDs of the integer type as the primary key?
For tables with a large amount of data, the demand for horizontal table sharding is often involved in the later stage. In this case, this auto-incrementing primary key will become a hindrance. (In fact, there will also be a solution for this situation. Please refer to the article "database sharding design in youpai network architecture".

ID data type selection

Let's consider the choice of primary keys from another perspective: data type.

  • Integer type:
    The integer type is often the best choice for the id column, because the efficiency is the highest and the auto-incrementing primary key of the database can be used.

  • String type
    The string type consumes more space than the integer type, and is slower than the integer type. I mainly use Mysql. For more information about this topic, see P125, the third edition of High Performance MySQL.


My solution (MySQL): using auto-incremental id as the primary key to cope with insertion efficiency issues; using uuid as the logical id has many advantages of the logical primary key, it can also be used to cope with subsequent horizontal table shards.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.