Global ID Generation for shards (sharding)

Last Update:2014-12-13 Source: Internet

Author: User

Tags postgres database

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Objective
When data is partitioned, it is typical for a sub-database to have a global ID generated problem. Simply generating a global ID is not a challenge, but the generated ID usually satisfies some of the requirements of the Shard:

There can be no single point of failure.
In the order of time, or the ID contains time. So one can be one less index, and the other is easy to separate hot and cold data.
can control the Shardingid. For example, a user's article should be placed in the same shard, so that the query efficiency, modification is easy.
Not too long, preferably 64bit. Using long is a good operation, if it is 96bit, it will be a variety of displacement is quite inconvenient, there may be some components can not support such a large ID.
Let's take a look at the foreigner's practice in chronological order:

Flickr
Flickr cleverly uses the self-increment ID of MySQL and the replace into syntax, which is very simple to implement the Shard ID generation function.

First, create a table:

CREATE TABLE ' Tickets64 ' (
' ID ' bigint (unsigned) not NULL auto_increment,
' Stub ' char (1) Not NULL default ' ',
PRIMARY KEY (' id '),
UNIQUE KEY ' Stub ' (' Stub ')
) Engine=myisam

Use the above SQL to get an ID:

REPLACE into Tickets64 VALUES (' a ');
SELECT last_insert_id ();
Because the syntax for replace into is used, in fact, the data in the TICKETS64 table is always the same:

+-------------------+------+
| ID | Stub |
+-------------------+------+
| 72157623227190423 | A |
+-------------------+------+
So how to solve a single point of failure?
It's easy to use MySQL's self-increment ID. For example, there are two ID generation servers, which can be set as below:

TicketServer1:
Auto-increment-increment = 2
Auto-increment-offset = 1

TicketServer2:
Auto-increment-increment = 2
Auto-increment-offset = 2
Advantages:
Simple and reliable.

Disadvantages:

ID is just an ID, no information such as shardingid, time, etc.

Twitter
Twitter leverages zookeeper to implement a global ID-generated service snowflake,https://github.com/twitter/snowflake that can generate globally unique 64bit IDs.

The composition of the generated ID:

Time--using the front of the first bit to represent the time, accurate to milliseconds, can represent the 69 data
The machine id--is represented by ten bits, which means 1024 machines can be deployed.
Number of sequences-represented by a bit, meaning that each machine can generate a maximum of 4,096 IDs per millisecond
Advantages:
Keep the information in the ID.

Disadvantages:

The structure is slightly complex and depends on zookeeper.

Shard IDs cannot be generated flexibly.

Instagram
Based on the Flickr experience, Instagram has used the features of the Postgres database to achieve a simpler and more reliable ID generation service.
This is how Instagram designs their IDs:

Use the five bit to store the time, accurate to milliseconds, can be used for 41 years.
Use the-bit to hold the logical shard ID.
Using the ten bit to store the self-growth ID means that each machine can generate a maximum of 1024 IDs per millisecond
An example of Instagram is illustrated:
Assuming that the time is September 9th, at 5:00pm, the number of milliseconds is 1387263000 (the number of milliseconds that the system gets directly from the beginning of 1970). So first put the time data in the ID:
id = 1387263000 << (64-41)
Then put the Shard ID in time, assuming that the user ID is 31341, there are 2000 logical shards, then the Shard ID is 31341%---1341:
ID |= 1341 << (64-41-13)
Finally, the self-increment sequence is placed in the ID, assuming that the previous sequence is 5000, then the new sequence is 5001:
ID |= (5001 1024)
This gives you a global shard ID.
The following is a list of the SQL Postgres schema used by Instagram:

[SQL] View plaincopy on code to see a snippet derived from my Code slice
reate OR REPLACE FUNCTION insta5.next_id (out result bigint) as $$
DECLARE
Our_epoch bigint: = 1314220021721;
seq_id bigint;
Now_millis bigint;
shard_id int: = 5;
BEGIN
SELECT nextval (' Insta5.table_id_seq ') with a percent-of-seq_id;

SELECT Floor (EXTRACT, EPOCH from Clock_timestamp ()) * +) into Now_millis;
Result: = (Now_millis-our_epoch) << 23;
Result: = Result | (shard_id << 10);
Result: = Result | (seq_id);
END;
$$ LANGUAGE Plpgsql;
Then, when inserting new data, simply use SQL like the following (the steps to generate ID are omitted!). ）：
[SQL] View plaincopy on code to see a snippet derived from my Code slice
CREATE TABLE insta5.our_table (
"id" bigint not NULL DEFAULT insta5.next_id (),
... rest of the table schema ...
)
Even if you do not understand the Postgres database, you can see from the above SQL probably. Porting this to MySQL should not be difficult.
Disadvantages:

Seems to really have no shortcomings.

Advantages:

Keep the information in the ID.

Make full use of the mechanism of the database itself, the program completely without additional processing, directly into the corresponding Shard table.

Scenarios for using Redis
Standing on the shoulders of the predecessors, I thought of a solution using Redis + LUA.

First of all, Lua built-in time functions can not be accurate to milliseconds, so first to modify the next Redis code, add Currentmiliseconds function, I lazy, directly added to the math module.

To modify the scripting.c file under Redis Code, add the following:

[CPP] View plaincopy on code to view a snippet derived from my Code slice
#include <sys/time.h>

int Redis_math_currentmiliseconds (lua_state *l);

void Scriptinginit (void) {
...
Lua_pushstring (LUA, "currentmiliseconds");
Lua_pushcfunction (Lua,redis_math_currentmiliseconds);
Lua_settable (lua,-3);

Lua_setglobal (LUA, "math");
...
}

int Redis_math_currentmiliseconds (lua_state *l) {
struct Timeval now;
Gettimeofday (&now, NULL);
Lua_pushnumber (L, now.tv_sec*1000 + now.tv_usec/1000);
return 1;
}
This scheme directly returns triples (time, shard ID, Growth sequence), and of course the Lua script is very flexible and can be modified at its own discretion.

Time: Number of milliseconds on the Redis server
Shard ID: Obtained by passing in the parameter keys[1]%1024.
Growth sequence: the "idgenerator_next_" prefix on Redis, and the key to the Shard ID is obtained with the Incrby command.
For example, a user sends an article to generate an article ID, assuming that the user ID is 14532,

Time <--math.currentmiliseconds ();
Shardindid <--14532% 1024; That is 196
ArticleID <--Incrby idgenerator_next_196 1//1 is the step of growth
The Lua script indicates:

Local step = Redis.call (' GET ', ' idgenerator_step ');
Local shardid = keys[1]% 1024;
Local next = Redis.call (' Incrby ', ' Idgenerator_next_ '): Shardid, Step);
return {math.currentmiliseconds (), Shardid, next};
The "Idgenerator_step" key is used to store the growth step size.
The client uses Eval to execute the above script, and after the ternary group, it can be freely combined into a global ID of 64bit.

Above is just a server, so how to solve a single point of problem?

The effect of the above "Idgenerator_step" is reflected.

For example, to deploy three Redis as ID generation servers, respectively, is a,b,c. Then set redis-a the following key values at startup:

Idgenerator_step = 3
Idgenerator_next_1, Idgenerator_next_2, idgenerator_next_3 ... idgenerator_next_1024 = 1
Set Redis-b the following key values:

Idgenerator_step = 3
Idgenerator_next_1, Idgenerator_next_2, idgenerator_next_3 ... idgenerator_next_1024 = 2
Set redis-c the following key values:

Idgenerator_step = 3
Idgenerator_next_1, Idgenerator_next_2, idgenerator_next_3 ... idgenerator_next_1024 = 3
Then the above three ID generation server is completely independent, and equal relationship. Any one server hangs up does not affect, the client just randomly chooses one to use the eval command to get ternary group.

I tested the next single Redis server to generate 30,000 IDs per second. Then deploying three ID servers is sufficient to support any application.

The test procedure is shown here:

https://gist.github.com/hengyunabc/9032295
Disadvantages:
If you are unfamiliar with LUA scripts, it may be more cumbersome to customize your own ID rules.

Note that machine time cannot be set to synchronize automatically, otherwise the ID will be duplicated due to time synchronization.

Advantages:

Very fast, and can be deployed linearly.

You can customize your own Lua scripts to generate IDs for various businesses.

Other stuff:
MongoDB Objectid, this is really too long to be 12 bytes:

ObjectId is a 12-byte BSON type, constructed using:

A 4-byte value representing the seconds since the Unix epoch,
A 3-byte machine identifier,
A 2-byte process ID, and
A 3-byte counter, starting with a random value.
Summarize:
Generating a global ID is not very difficult to achieve, but from the practice of each network, and evolution can learn a lot of stuff. Sometimes some simple ready-made components can solve the problem, but lack of ideas.

Reference:
http://code.flickr.net/2010/02/08/ticket-servers-distributed-unique-primary-keys-on-the-cheap/
Http://instagram-engineering.tumblr.com/post/10853187575/sharding-ids-at-instagram

https://github.com/twitter/snowflake/
Http://docs.mongodb.org/manual/reference/object-id/

http://www.redisdoc.com/en/latest/script/eval.html Redis Script Reference

Global ID Generation for shards (sharding)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More