Id generator introduction, id Generator

Source: Internet
Author: User
Tags cassandra

Id generator introduction, id Generator
Background

In general business scenarios, simple auto-increment (for example, MySQL auto-increment key) can meet the needs at the beginning, but with the development of business and drive, especially in Distributed scenarios, how to generate a globally unique id becomes a matter of careful consideration. how to coordinate between businesses and whether there are other requirements for the generated sequence need to be re-designed. The following describes the different methods for generating unique IDs and their applicable scenarios.

1. Introduction to twitter Snowflake

For more information, see announcing-snowflake.Twitter ProblemsTwitter uses MySQL to store online data, but with the development of business, it has now become a large database cluster. for various reasons, in some details, twitter uses the distributed database Cassandra or horizontal split MySQL to better serve global blog posts and posts. cassandra does not have a built-in function similar to the MySQL auto-incrementing primary key, which also means that as the business expands, cassandra makes it difficult to provide a general solution (one-size-fits-all solution) in sequence id. This problem also exists in the architecture of Horizontal Split MySQL. based on these questions, twitter raised the following requirements:

1. generate tens of thousands of IDS per second and provide services in high availability mode; 2. because the business relationship can only be generated in non-Coordination (business-independent) mode; 3. generally, the id numbers must be sorted, which means that the IDs of the two articles A and B should be similar. 4. the id number must be 64-bit.

 

Optional SolutionsTwitter also considered several ways to meet the above requirements:

1. MySQL-based services; 2. UUID mode; 3. zookeeper sequential nodes;

MySQL-based ticket servers is implemented by auto-incremental id. However, this method is difficult to ensure that IDs are generated in sequence without restructuring the program, and cannot be sorted by time; while the UUID is 128-bit, there is a probability of conflict, there is no timestamp; and zookeeper time series nodes are difficult to meet the performance of tens of thousands of per second.Twitter SolutionTo generate 64-bit id numbers that can be sorted in a general sense, twitter proposes to combine three fields to generate id numbers: timestamp, worker ), number of sequences (sequence number ). the number of sequences and the working number are determined after each thread connects to zookeeper. For detailed code, see the advantages of the snowflake method. First, the start part is the timestamp, you can easily create indexes. Secondly, articles or posts published in the same thread can be sorted and the id number is approaching. In addition, the id number is sorted in an approximate order.Id implementationThe twitter id is implemented in a combination of the following parts to form a 63-bit integer with the highest digit 0:

id is composed of:   time - 41 bits (millisecond precision w/ a custom epoch gives us 69 years)   configured machine id - 10 bits - gives us up to 1024 machines   sequence number - 12 bits - rolls over every 4096 per machine (with protection to avoid rollover in the same ms)

The machine id accounts for 10 bits (5-bit Data Center id, 5-bit working id), and the maximum value is 1024. The timestamp is accurate to milliseconds, and 41 BITs (for example, 1490842567501 is accurate to milliseconds ), each time a new id is generated, the current system time needs to be obtained, and the sequence number is generated in two cases:

If the current time is the same as the previous generated time (in the same millisecond), use the 'sequence number + 1' of the previous id as the new sequence number; if the id of this millisecond is used up, wait until the next millisecond continues (a new id cannot be allocated during the wait period); if the current time is longer than the previous id, an initial sequence number is randomly generated as the first sequence number in milliseconds;

 

During the whole process, only when the worker is started, it will be dependent on the external (get the worker number from zookeeper), and then it can work independently and achieve decentralization; in addition, if an exception occurs:

The current time obtained is less than the time of the previous id, twitter continues to get the time of the current machine until it gets a longer time (a new id cannot be assigned during the waiting process );

 

From this point of view, if the machine's clock deviation is large, the entire system will not work normally. the snowflake document also prompts you to use ntp to synchronize the system clock, at the same time, configure ntp to a mode that will not be adjusted backwards. For details, see Time_synchronization.

System Clock Dependency You should use NTP to keep your system clock accurate.  Snowflake protects from non-monotonic clocks, i.e. clocks that run backwards.  If your clock is running fast and NTP tells it to repeat a few milliseconds, snowflake will refuse to generate ids until a time that is after the last time we generated an id. Even better, run in a mode where ntp won't move the clock backwards. See http://wiki.dovecot.org/TimeMovedBackwards#Time_synchronization for tips on how to do this.

See Unique-ID

2. last_insert_id Method

For details, refer to: If you use MySQL as the serial number service, you cannot use uuid. This problem is the same as that described in snowflake and does not support md5 or guid. These are too hashed, it is not conducive to index creation and search. The text in the flickr describes how to generate serial numbers using the MySQL auto-increment id method. this method is also used by many small and medium-sized businesses, but many use the InnoDB engine to create ticket tables:

  

CREATE TABLE `Tickets64` (  `id` bigint(20) unsigned NOT NULL auto_increment,  `stub` char(1) NOT NULL default '',  PRIMARY KEY  (`id`),  UNIQUE KEY `stub` (`stub`)) ENGINE=MyISAMREPLACE INTO Tickets64 (stub) VALUES ('a');SELECT LAST_INSERT_ID();

 

When a replace statement has a unique key or primary key conflict, a mutex next-key lock is added to avoid phantom read during queries or index scanning. For details, see: innodb-locks-set, but this will also lead to a problem. When multiple threads are concurrently updated, deadlocks may occur. The MyISAM engine works better, but it is not conducive to innobackupex online backup, if there are few records, you can change it to the MyISAM engine. using this method for a single business is a good solution. for better performance, you can adopt a dual-master architecture, but you need to set the offset values and step sizes of the auto-increment keys.

3. Introduction to MariaDB Sequence

MariaDB 10.0.3 introduces a new engine: Sequence. Unlike postgresql, MariaDB's sequence is special. It is a virtual, temporary auto-incrementing Sequence, and the sequence disappears after the session ends, the persistence function is not available and cannot be referenced by other tables as the auto-incrementing primary key. sequence determines the boundary and auto-increment value based on the table name.How to Use

 

SELECT * FROM seq_1_to_5;+-----+| seq |+-----+|   1 ||   2 ||   3 ||   4 ||   5 |+-----+

 

  

SELECT * FROM seq_1_to_15_step_3;+-----+| seq |+-----+|   1 ||   4 ||   7 ||  10 ||  13 |+-----+

 

SELECT * FROM seq_5_to_1_step_2;+-----+| seq |+-----+|   5 ||   3 ||   1 |+-----+

 

Note: If the sequence engine is enabled, the new table name cannot conflict with the sequence table name. The temporary table can be the same as the sequence table name. MariaDB sequence misunderstandingUnlike the sequence generators of PostgreSQL and FirebirdSQL, The MariaDB sequence engine only takes the execution time of the current statement and does not have the persistence function or nextval-related functions. sequence cannot generate negative sequence, and it cannot be round-robin when the Maximum/minimum boundary is reached (similar to the CYCLE option of PostgreSQL sequence generator ).MariaDB sequence application scenariosFor more information, see mariadbs-sequence.

1. Find the empty rows in the column. 2. Generate the number of combinations. 3. Generate the common number of two numbers. 4. Generate the sorted characters. 5. Generate the sorted date and time.
4. postgresql sequence generator

The sequence generator provided by postgresql can meet the needs of sequence numbers, similar to the last_insert_id method of MySQL. However, postgresql sequence includes the following features:

1. sequences can be used for multiple fields in a table. 2. sequences can be shared by multiple tables;

 

For details about how to create a sequence, see SQL-createsequence. The syntax is rich and many parameters are supported. You can set the starting value, upper limit, cache, and loop of the sequence. for sequence functions, see functions-sequence operation sequence functions.

Currval (regclass) bigint returns the value lastval () bigint of the specified sequence obtained with nextval last time to return the value of nextval (regclass) of any sequence obtained with nextval) bigint increments the sequence and returns the new value setval (regclass, bigint) bigint sets the current value of the sequence setval (regclass, bigint, boolean) bigint sets the current value of the sequence and the is_called flag

 

Before calling the currval function, the program must execute the nextval function. if the is_called value of setval is false, the next time the nextval function is called, the declared value will be within the range, and nextval will be called again to start the incremental sequence. the regclass type is a parameter of the relevant function. Here it is the name of the sequence. as follows:

cztest=# create sequence seq1;CREATE SEQUENCEcztest=# select nextval('seq1'); nextval ---------       1(1 row)cztest=# select nextval('seq1'); nextval ---------       2(1 row)cztest=# select currval('seq1'); currval ---------       2(1 row)cztest=# select setval('seq1', 1, false); setval --------      1(1 row)cztest=# select nextval('seq1'); nextval ---------       1(1 row)cztest=# select nextval('seq1'); nextval ---------       2(1 row)

 

  

Problems frequently encountered when using sequence generators

Summary

Among the four id generation methods described above, the sequence of MariaDB is not suitable for the sequence generator. many small and medium businesses use the last_insert_id Method Based on MySQL. this method is easy to use in a single business. You can create as many tables as you need, and do not use services with distributed features. in addition, many open-source tools, such as idgo, are based on this method. They only provide interfaces compatible with the redis protocol. creating multiple sequences means that multiple MySQL tables are mapped, deadlock cannot be avoided in scenarios with high concurrency. postgreSQL's sequence generator is a built-in function with a wealth of operation functions, and has better performance than MySQL in terms of concurrency, popular open-source tools, javasst and prest, both provide http interfaces, and the existing programs are easy and convenient to transform. the snowflake method is suitable for businesses in Distributed scenarios, and can also be used for businesses with strong time dependencies. In addition, this method should be the best in terms of performance. existing open-source tools, such as sony or goSnowFlake, are well implemented. It is convenient to use http interfaces to provide external services. however, compared with the above two methods, open-source tools do not implement persistence and high availability functions, and it is difficult to generate a sequence when the service is interrupted, we need to perform secondary development.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.