This article will mainly introduce some common global primary key generation strategies, and then highlight a very good global primary key generation scheme used by Flickr. For the split strategy and implementation details of the sub-list (sharding), refer to the previous article in the series: Database Library sub-table (sharding) series (i) Split implementation strategy and sample demo the original text connection: http://blog.csdn.net/ bluishglc/article/details/7710738, reprint please specify the source!
The first part: Some common key generation strategies
Once the database is sliced into multiple physical nodes, we will no longer be able to rely on the primary key generation mechanism of the database itself. On the one hand, the self-generated ID of a partitioned database is not guaranteed to be globally unique; On the other hand, the application needs to obtain the ID before inserting the data for SQL routing. At present, several feasible primary key generation strategies are:
1. UUID: Using UUID as the primary key is the simplest solution, but the disadvantage is very obvious. Because the UUID is very long, in addition to consuming a lot of storage space, the main problem is that on the index, there are performance problems in indexing and index-based querying.
2. Maintain a sequence table with a database: The idea of this scenario is also very simple, to create a sequence table in the database, the table structure is similar to:
[SQL]View Plaincopy
- create table ' sequence ' (
- ' tablename ' varchar (in) not null,
- ' NextID ' bigint ( not null,
- primary key (' tablename ')
- ) engine=innodb
Whenever you need to generate an ID for a new record for a table, remove the corresponding table from the sequence table and update the value of NextID to the database for the next use, after adding 1 to the NextID. This scenario is also simpler, but the downside is equally obvious: because all inserts require access to the table, the table can easily become a system performance bottleneck, and it also has a single point of failure, and the entire application will not work once the table database fails. It is proposed to use Master-slave for master-slave synchronization, but this can only solve the single point problem, and can not solve the read-write ratio of 1:1 access pressure problem.
In addition to this, there are some scenarios, such as partitioning the ID of each database node, and some ID generation algorithms on the web, which are not recommended because of the lack of operability and practical testing. In fact, next, we're going to introduce a primary key generation scheme used by Fickr, which is one of the best scenarios I've known, and has withstood the test of practice, and can be used for most application systems.
Part II: A very good primary key generation strategy
In 2010, the Flickr development team introduced a primary key generation measurement strategy used by Flickr, and said the actual performance of the program on Flickr was also very satisfying, with the original connection: Ticket servers:distributed Unique Primary Keys on the Cheap this is the best solution I know at the moment, it's similar to the General Sequence table scheme, but it solves the performance bottleneck and the single point problem, and is a very reliable and efficient global primary key generation scheme.
Figure 1: The Sharding primary key generation scheme used by Flickr (Click to view larger image)
The overall idea of the Flickr program is to build more than two database ID generation servers, each with a sequence table that records the current ID of each table, but the number of steps in the ID growth in sequence is the amount of servers, and the starting values are staggered sequentially. This is equivalent to hashing the generation of IDs on each server node. For example: If we set up two database ID generation server, then let one of the sequence table ID start value is 1, each growth step is 2, the other sequence table ID starting value is 2, each growth step is also 2, Then the result is that the odd ID will be generated from the first server, even the ID is generated from the second server, so that the pressure to generate the ID is evenly dispersed to two servers, while cooperating with the application control, when one server fails, the system can automatically switch to another server to obtain the ID, Thus, the system's fault tolerance is ensured.
Here are a few details about this program:
1. Flickr's database ID generation server is a dedicated server, there is only one database on the server, and the tables in the database are used to generate sequence, This is also because the two database variables, Auto-increment-offset and auto-increment-increment, are variables at the database instance level.
2. The stub field in the table in the Flickr scenario is just a个char(1) NOT NULL存根字段,并非表名,因此,一般来说,一个Sequence表只有一条纪录,可以同时为多张表生成ID,如果需要表的ID是有连续的,需要为该表单独建立Sequence表
。
3. The program uses the MySQL last_insert_id () function, which also determines that the sequence table can have only one record.
4. Using replace into to insert data is very flattering, and it is very important to use MySQL's own mechanism to generate the ID, not only because it is so simple, but also because we need the ID to be generated in the way we set it (initial value and step size).
5. SELECT last_insert_id () must be in the replace into statement under the same database connection to get the new ID just inserted, otherwise the returned value is always 0
6. The sequence table in this scenario uses the MyISAM engine for higher performance, note: The MyISAM engine uses a table-level lock, and theMyISAM reads and writes to the table are serial , So you don't have to worry that two reads will get the same ID when concurrency occurs (in addition, the program does not need to be synchronized, each requested thread gets a new connection, there is no shared resource that needs to be synchronized). After the actual comparison test, using the same sequence table for ID generation, the MyISAM engine is much higher than the INNODB performance!
7. You can use pure JDBC to perform operations on the sequence table for greater efficiency, and experiments have shown that even if you use only spring JDBC performance is not as fast as pure JDBC!
To implement this scenario, the application also needs to do some processing, mainly two aspects of the work:
1. Automatic equalization of database ID generation server access
2. Make sure that if a database ID generation server fails, the request can be forwarded to another server for execution.
Related reading:
Database sub-list (sharding) series (v) a sharding scale-up scheme for free planning without data migration and modification of routing code database sub-list (sharding) series (iv) Transaction database sub-list of multiple data sources (sharding) series (iii) Considerations regarding the use of frameworks or self-development and sharding implementation level database sub-list (sharding) series (ii) Global primary key generation Strategy database sub-list (sharding) series (i) Split implementation strategy and example demo about vertical Segmentation vertical The basic idea and the segmentation strategy of Sharding's granular database sharding