Database primary key ID generation policy

Source: Internet
Author: User
Tags md5 hash repetition unique id uuid

Objective:

System unique ID is a problem we often encounter when designing a system, here are some common ID generation strategies.

    • Sequence ID
    • Uuid
    • Guid
    • Comb
    • Snowflake

The initial self-increment ID in order to achieve the requirements of the separate library, will be in the premise of self-increase, using different starting point , but need to do database expansion, extremely troublesome . For example, when we first design a database of a system, we have 10 tables in the database, so we need different IDs for each table's content, so we can use different forms that are not growing, for example, the first table is 1, 11, 21, 31 ... The second sheet is 2, 12, 22, 32 ... The third sheet is 3, 13, 23, 33 ... The tenth sheet is 10, 20, 30 ... But the problem is, if one day I found out that the system's 10 tables are not enough, I would like to add another table, then the primary key should be how to allocate it?  In addition, if the data for multiple databases wants to be merged, but for this simple generation ID, the likelihood of repetition is very high, so it is almost certain that this situation will occur again. Obviously, if you use the previous method, the extensibility will be poorer.

  

Compared to the self-increment ID, theuuid is more convenient to generate a unique primary key (the amount of data is very large, there is the possibility of duplication ), but because of the disorder of the UUID , performance is not as self-increment ID, string storage, Large storage space, low query efficiency . Key: The disadvantage of using UUID is the low query efficiency AH!

  

  comb with respect to UUID, the order of generation ID is increased, and the efficiency of insert and query is improved . This article has a simple analysis.

 Sonwflake is the Twitter primary key generation strategy , which can be seen as an improvement of the comb , replacing a 128-bit string with a 64-bit long integer. ID Composition: The first bit 0 + 41-bit time prefix + 10-bit node identifier + 12-bit sequence avoids concurrent numbers.

First part: Sequence ID

The most common way to grow a database from a sequence or field. Maintained by the database, the database is unique.

  

Advantages:
    1. Simple, code-friendly, performance acceptable.
    2. The numeric ID is a natural sort, which is helpful for paging or the need to sort the results.

Disadvantages:
    1. Different database syntax and implementation are different, when the database migration or multi-database version support needs to be processed.
    2. In the case of a single database or read-write separation or a master multi-slave, only one master library can be generated. Risk of a single point of failure.
    3. It is more difficult to scale up without performance requirements.
    4. It can be quite painful to meet multiple systems that require merging or involve data migration.
    5. There will be trouble when sorting the tables.


Optimization scenarios:

For the main library point, if you have more than one master library, the starting number for each master library setting is different, the same as the step size, which can be the number of master.
For example: Master1 generated is 1,4,7,10,master2 generated is 2,5,8,11 Master3 generated is 3,6,9,12. This effectively generates a unique ID in the cluster, and can significantly reduce the load on the ID generation database operation.



Part II: UUID

NPM Management Https://www.npmjs.com/package/uuid

Common way, 128 bits. The database can also be generated using the program, which is generally unique worldwide.

A UUID is a globally unique identifier of 128 bits, typically represented by a 32-byte string. It can guarantee the uniqueness of time and space, also known as the GUID, all called:uuid――universally unique IDentifier, called UUID in Python.

  

It guarantees the uniqueness of the Generation ID by MAC address, timestamp, namespace, random number, pseudo-random number .

UUID mainly has five algorithms, that is, five ways to achieve.

  

(1), UUID1 ()

--based on time stamp. Generated by MAC address, current timestamp, random number. can guarantee uniqueness globally, but the use of Mac also brings security problems, LAN can use IP instead of Mac.

  

(2), Uuid2 ()

--based on the Distributed computing environment DCE (this function is not in Python). The algorithm is the same as UUID1, the difference is to replace the first 4 bits of the timestamp with the POSIX uid. This method is seldom used in practice.

(3), UUID3 ()

--MD5 hash value based on the name. By calculating the MD5 hash of the name and namespace, it is worthwhile to ensure the uniqueness of different names in the same namespace, and the uniqueness of different namespaces, but the same name of the same namespace produces the same UUID.

(4), Uuid4 ()

– based on random numbers. obtained by pseudo-random number, there is a certain repetition probability, the probability can be calculated.

(5), UUID5 ()

--SHA-1 hash value based on the name. The algorithm is the same as UUID3, and the other is using the Secure Hash algorithm 1 algorithm.

Advantages:
    1. simple, convenient code .
    2. The world's only, in the face of data migration, system data consolidation, or database changes, etc., can be calmly addressed .
Disadvantages:
    1. There is no sort and there is no guarantee of trend increment .
    2. UUID is often used to store strings, and queries are less efficient .
    3. storage space is relatively large, if it is a huge database, you need to consider the problem of storage.
    4. Large amount of transmitted data
    5. Not readable.

Optimization scenarios:
    1. To resolve UUID unreadable, you can use the UUID to Int64 method.
Part III: GUID GUID: Microsoft's implementation of the UUID standard. UUID also has a variety of other implementations, more than one GUID.    Pros and cons with UUID. Part IV: Comb

  

Comb ( Combine) type is a unique design idea of database, can be understood as an improved GUIDIt has better performance by combining GUIDs and system time to make it in indexing and retrieving things
There is no comb type in the database, which Jimmy Nilsson designed in his "the cost of GUIDs as Primary Keys" article.
The basic design of the comb data type is this: since uniqueidentifier data due to the lack of regularity can be caused by inefficient indexing, affecting the performance of the system, then we can be combined to retain the first 10 bytes of uniqueidentifier, Use the latter 6 bytes to represent the time (DateTime) of the GUID generation, so that we combine the time information with the uniqueidentifier to improve the efficiency of the index by increasing the order while preserving the uniqueness of the uniqueidentifier.

Advantages:
    1. The problem of unordered UUID is solved, and the comb algorithm (combined Guid/timestamp) is provided in its primary key generation mode. The 10 bytes of the GUID are reserved, and the time (DateTime) in which the GUID is generated is represented by a different 6 bytes.
    2. Performance is better than UUID.

Part V: Twitter's snowflake algorithm

Snowflake is the open-source distributed ID generation algorithm for Twitter, and the result is a long ID. The core idea is: use 41bit as the number of milliseconds, 10bit as the machine ID (5 bit is the data center, 5 bit Machine ID), 12bit as the number of milliseconds (meaning that each node can produce 4,096 IDs per millisecond), and finally a sign bit, is always 0. The snowflake algorithm can be modified according to the needs of its own project. For example, estimate the number of data centers in the future, the number of machines per data center, and the number of concurrent milliseconds that a unified millisecond can have to adjust the number of bits required in the algorithm.

Advantages:
    1. Not dependent on the database, flexible and convenient, and performance is better than the database.
    2. The ID is incremented on a single machine by time.
Disadvantages:
    1. is incremental on a single machine, but due to the distributed environment, the clocks on each of the machines may not be fully synchronized, and sometimes not globally incremented.
Six, using this to use is really convenient:
NPM Install UUID--save

  

Then you can use it!

  const UUIDV1 = require ('uuid/v1');  Console.log (' random uuid string ', Uuidv1 ());

In this way, we can print out the UUID string. It's not the same every time.

Reference article: Https://www.npmjs.com/package/uuid
Http://www.jianshu.com/p/d553318498ad

Http://www.jianshu.com/p/a0a3aa888a49

Database primary key ID generation policy

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.