Thinking of distributed unique Id:snowflake algorithm

Source: Internet
Author: User
Tags time in milliseconds unique id

The originality of zero reprint please indicate the original source, thank you!

Origin

Why do you suddenly talk about distributed unique IDs? The reason is recently in preparation for the use of ROCKETMQ, see the official website introduction:

In a word, the message may be repeated, so the consumer needs to be idempotent. Why the message repeats in detail in subsequent ROCKETMQ chapters, the focus of this section is not here.

In order to achieve the power of the business, such an ID must exist, the following conditions need to be met:

    • The same business scenario must be globally unique.
    • The ID must be generated in the sender of the message to be sent to MQ.
    • The consumer is judged by the ID to repeat, to ensure idempotent.

There, and consumer judgment and so on, and this ID is not related, the requirements of this ID is local unique or global unique can, because this ID is unique, can be used when the primary key of the database, since the master key so before just good post an article: from the developer point of view MySQL (1): The primary key problem , the article focuses on why the need for self-growth, or the benefits of trend self-growth. (related to MySQL data storage practices).

Then the ID needs to have 2 features:

    • Local, globally unique.
    • Trend increment.

There are many ways to generate a distributed unique ID if there are methods that can generate globally unique (and then must be unique locally). You can look at the Distributed System unique ID generation Scheme rollup: http://www.cnblogs.com/haoxinyue/p/5208136.html
(Since his address is not displayed, so the address is affixed to the next)
, this article mentions a lot and the advantages and disadvantages of each.
The focus of this paper is on the snowflake algorithm, the algorithm realizes the ID to meet the above mentioned 2 points.

Snowflake algorithm

Snowflake is the open-source distributed ID generation algorithm for Twitter, and the result is a long ID. The core idea is: use 41bit as the number of milliseconds, 10bit as the machine ID (5 bit is the data center, 5 bit Machine ID), 12bit as the number of milliseconds (meaning that each node can produce 4,096 IDs per millisecond), and finally a sign bit, is always 0.

The implementation of this algorithm is basically binary operation, if the binary unfamiliar can look at my previous written related articles: Java binary related basic, binary combat skills.

This algorithm can theoretically generate up to 1000* (2^12), that is, 4.096 million IDs per second, (Roar, this gets fast AH).

Java implementation code is basically like this (almost, basically, the bits operation):
Reference: https://www.cnblogs.com/relucent/p/4955340.html

/*** TWITTER_SNOWFLAKE<BR>* The structure of the snowflake is as follows (each part with-separate):<br>* 0-0000000000 0000000000 0000000000 0000000000 0-00000-00000-000000000000 <br>* 1-bit ID, because the long base type is signed in Java, the highest bit is the sign bit, the positive number is 0, the negative number is 1, so the ID is generally positive, the highest bit is 0<br>* 41-bit time truncation (millisecond), note that the 41-bit time-intercept is not the time-intercept for storing the current time, but rather the difference in time-truncation (current-time-stop-start-time truncation)* Get the value), here's the start time truncation, is generally our ID generator started using the time, by our program to specify (below program Idworker class StartTime property). 41-bit time-cut, can be used 69 years, Year t = (1L <<)/(1000L * * * * 365) = 69<br>* 10 bits of data machine bit, can be deployed on 1024 nodes, including 5-bit datacenterid and 5-bit workerid<br>* 12-bit sequence, count in milliseconds, 12-bit count sequence number supports each node per millisecond (same machine, same time intercept) to generate 4,096 ID number <br>* Add up just 64 bits, for a long type. <br>* The advantage of Snowflake is that the overall time-to-order, and the entire distributed system will not generate ID collisions (by the data center ID and machine ID to differentiate), and high efficiency, tested, snowflake per second can produce about 260,000 ID.  */Public class Snowflakeidworker {//==============================fields===========================================/** start time Cut (2015-01-01) */private final long Twepoch = 1420041600000L;/** the number of bits in the machine ID * /private final long workeridbits = 5L;/** the number of digits in the ID of the data ID * /private final long datacenteridbits = 5L;The maximum machine ID supported by/**, the result is 31 (this shift algorithm can quickly calculate the maximum number of decimal numbers that can be represented by several binary numbers) */private final Long Maxworkerid = -1l ^ ( -1l << workeridbits);The maximum data identification ID supported by the/**, the result is * /private final Long Maxdatacenterid = -1l ^ ( -1l << datacenteridbits);/** the number of digits in the ID of the sequence * /private final long sequencebits = 12L;/** Machine ID Move left 12 bits * /private final long workeridshift = sequencebits;/** data identification ID move left 17 bits (12+5) */private final long Datacenteridshift = sequencebits + workeridbits;/** time truncation to the left 22 bits (5+5+12) */private final long Timestampleftshift = sequencebits + workeridbits + datacenteridbits;/** The mask of the generated sequence, here is 4095 (0b111111111111=0xfff=4095) */private final Long sequencemask = -1l ^ ( -1l << sequencebits);/** work Machine ID (0~31) */private long Workerid;/** data Center ID (0~31) */private long Datacenterid;/** Millisecond Sequence (0~4095) */private long sequence = 0L;/** The last time the ID was generated * /private Long Lasttimestamp = -1l;//==============================constructors=====================================    /*** Constructor Function* @param workerid Job ID (0~31)* @param datacenterid data center ID (0~31)     */Public Snowflakeidworker (long Workerid, long Datacenterid) {if (Workerid > Maxworkerid | | Workerid < 0) {throw new IllegalArgumentException (String.Format ("worker Id can ' t is greater than%d or less than 0", Maxw Orkerid));        }if (Datacenterid > Maxdatacenterid | | Datacenterid < 0) {throw new IllegalArgumentException (String.Format ("Datacenter Id can ' t is greater than%d or less than 0", Maxdatacenterid));        }This.workerid = Workerid;This.datacenterid = Datacenterid;    }//==============================methods==========================================    /*** Get the next ID (the method is thread-safe)* @return Snowflakeid     */Public synchronized Long NextID () {long timestamp = Timegen ();//If the current time is less than the timestamp generated by the last ID, it indicates that the system clock should throw an exception at this timeif (Timestamp < Lasttimestamp) {throw new RuntimeException (String.Format ("Clock moved backwards. Refusing to generate ID for%d milliseconds ", lasttimestamp-timestamp));        }//If generated at the same time, sequence in millisecondsif (Lasttimestamp = = timestamp) {sequence = (sequence + 1) & Sequencemask;sequence overflow in//millisecondsif (sequence = = 0) {//block to the next millisecond to get a new timestamptimestamp = Tilnextmillis (lasttimestamp);            }        }//timestamp change, sequence reset in millisecondselse {sequence = 0L;        }//The time of the last generation ID is truncatedlasttimestamp = timestamp;//Shift and join or operate together to form a 64-bit IDreturn ((Timestamp-twepoch) << timestampleftshift)//                | (Datacenterid << Datacenteridshift)//                | (Workerid << Workeridshift)//| sequence;    }    /*** block to the next millisecond until a new timestamp is obtained* @param lasttimestamp The last time the ID was generated* @return Current timestamp     */protected Long Tilnextmillis (long Lasttimestamp) {long timestamp = Timegen ();While (timestamp <= lasttimestamp) {timestamp = Timegen ();        }return timestamp;    }    /*** Returns the current time in milliseconds* @return Current time (ms)     */protected Long Timegen () {return System.currenttimemillis ();    }//==============================test=============================================/** Test * /Public static void Main (string[] args) {Snowflakeidworker idworker = new Snowflakeidworker (0, 0);for (int i = 0; i <; i++) {Long id = Idworker.nextid ();System.out.println (long.tobinarystring (ID));System.out.println (ID);        }    }}

Advantages:

    • Fast (Haha, the world martial arts only fast not broken).
    • There is no dependency, and the implementation is particularly simple.
    • Know the principle can be adjusted according to the actual situation of each section, convenient and flexible.

Disadvantages:

    • Only trend increments. (some also not called shortcomings, some online if the absolute increment, competitors at noon orders, the next day in order to probably judge the amount of the company's orders, the danger!!! )
    • Depending on the machine time, if a callback occurs, it can cause duplicate IDs to be generated.
      The following focuses on time-callback issues.
Thinking on time callback of snowflake algorithm

Because of the time callback problem, but he is so fast and simple, we think can solve it? Zero on the Internet to find a lap did not find a specific solution, but found a beautiful group of Good article: leaf--reviews Distributed ID generation system (https://tech.meituan.com/MT_Leaf.html)
The article is very good, unfortunately did not mention the time callback how to specifically solve. Let's look at some of the 0-degree thoughts:

Reason for analysis time callback

First: The character operation, in the real environment generally will not have that silly to do this kind of thing, so basically can be ruled out.
Second: Due to some business and other needs, the machine needs to synchronize the time server (in this process there may be time callback, check our server is generally within 10ms (2 hours synchronization)).

Workaround
    1. Because it is distributed on each machine itself, if you want a few centralized machines (and do not do time synchronization), then there is basically no callback possibility (the curve of salvation is also the salvation, haha), but also did bring new problems, each node needs to access the centralized machine, to ensure performance, Baidu's Uid-generator generation is based on this situation do (each fetch a batch back, very good ideas, performance is very good) https://github.com/baidu/uid-generator.
      If you have adopted here, basically there is no problem, you do not need to see, if you want to see the zero degree of their own thinking can continue to look down (zero thinking is only a thought may not be good, look forward to your communication. ), Uid-generator I haven't looked at it yet, but look at the test report is very good, the back is empty really want to take a good look.

    2. The following talk about the zero degree of their own thinking, but also probably with the United States leaf author exchange, indeed, 0 degrees of this can solve a part of the problem, but introduced some other problems and dependencies. Is the zero degree of thinking, expecting more big guy to give some advice.

Workaround for Time question callback:

    1. When the callback time is less than 15ms, wait for the time to catch up and continue to generate.
    2. When the time is greater than 15ms, we can solve the callback problem by replacing the WorkID to produce anything that has not been produced before.

First, the number of WorkID is adjusted (15 bits can reach more than 30,000, generally enough)

The snowflake algorithm adjusts the lower segment slightly:

    • Sign (1bit)
      Fixed 1bit symbol identification, that is, the generated smooth distributed unique ID is a positive number.
    • Delta seconds (in bits)
      The current time, in milliseconds, for up to 8.716 years, relative to the increment of the time base "2017-12-21"
    • Worker ID (in bits)
      Machine ID, which can support up to approximately 32,800 nodes.
    • Sequence (ten bits)
      The concurrent sequence per second, ten bits, this algorithm can theoretically generate up to 1000* (2^10), which is the ID of 100W , to meet the needs of the business.

Because of the service stateless relationship, so the general WorkID is not configured in the specific configuration file inside, look at my this piece of thinking, why the need for stateless. High availability of some thinking and understanding, here we choose Redis for central storage (ZK, DB) is the same, as long as the centralized can be.

Here's the key:
Now I put more than 30,000 workid in a queue (based on Redis), because need a centralized place to manage workid, whenever the node starts, (first in a local place to see if there is a weak dependent ZK local first save), if there is so value as WorkID, if there is no , just take one in the queue when the WorkID to use (the queue takes away and then no), when the discovery time callback too much, we will go to the queue to take a new workid to use, the just that use callback case of the WorkID in the queue (the queue we each time is from scratch, Insert from the tail to avoid the possibility that a machine has just been used and obtained by the B machine.

Several questions are worth thinking about:

    • If I introduce Redis, why not use Redis to distribute IDs? (View the Distributed system unique ID Generation Scenario summary will get the answer, we are only used for consistency queue, can do the basic of the consistency queue can be).

    • The introduction of Redis means the introduction of other third-party architectures, and the best way to do this is not to cite (the simpler the better, the more you are learning now).

    • How is Redis conformance guaranteed? (Redis hangs up how to do, how to synchronize, is indeed debatable.) May introduce a lot of new minor issues).

Summarize

So choose similar to Baidu's kind of practice is better, concentrated after the approval, zero degree of thinking although thinking, but from the basic components is not particularly suitable, but also a way of thinking it. Look forward to the exchanges with the big boys.

If you feel that there is a harvest, welcome to praise, attention, add the public number "ingenuity Zero", read more wonderful history!!!

Thinking of distributed unique Id:snowflake algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.