"Database primary key scheme in distributed environment"
Using a self-added primary key ID is certainly the best fit when using only a single database. But in the cluster, master-slave architecture, there will be some problems, such as: primary key globally unique.
A scenario for creating a primary key in a clustered environment other than a self-increasing ID
1. Generate a GUID from the application and insert the split cluster with the data.
The advantages are simple maintenance and easy implementation. The disadvantage is that the application of the calculation cost is large, and the length of the GUID is relatively long, occupy a large database storage space, involving the development of applications.
Description: The main advantage is simple, the disadvantage is to waste storage space.
2, through the independent application of the database in advance to generate a series of unique IDs, the application through the interface or to read their own and then insert the data into the segmentation of the cluster.
The advantage is that the globally unique primary key is simple and maintenance is relatively easy.
The disadvantage is that implementation is complex and requires application development.
Description: ID table to frequently check and frequent updates, insert data, affect performance.
3, through the "Central database server" to take advantage of the database itself type (such as MySQL auto_increment field), or the self-added objects (such as Oracle's Sequence) and so on, such as a unique ID and data inserted into the segmentation cluster. The advantage is. There seems to be no particularly obvious advantage. The disadvantage is that the implementation is more complex and the overall availability is maintained on this central database server, and once this is crash, all clusters are unable to insert, involving application development.
Description: Not recommended.
4, through the cluster number plus the cluster within the auto_increment (type) "Two fields" together to form a unique primary key. The advantages are simple implementation, maintenance is simple, transparent to the application.
The disadvantage is that referencing the associated operation is relatively complex, requires two fields, the primary key occupies a larger space, the use of InnoDB when the side effects are obvious.
Note: Although it is two fields, this way has the smallest storage space, just one more smallint two bytes.
5, by setting the ID starting point (auto_increment_offset) in each cluster, the IDs of each cluster are absolutely segmented to achieve global uniqueness. When a cluster data grows too fast, you can skip the possible conflicts by using the command to adjust the next ID start position. The advantage is that the implementation is simple, and it is easier to determine according to the ID size of the data in which cluster, transparent to the application. The disadvantage is that maintenance is relatively complex and requires a high degree of attention to each cluster ID growth situation.
Description: The section is full, the adjustment is too troublesome.
6, by setting the ID starting point (auto_increment_offset) in each cluster and the ID self increase step (auto_increment_increment), so that the current starting point of each cluster staggered 1, the step selection is more than the future of the basic impossible to reach the number of segmentation, to achieve the effect of the relative segmentation of IDs to meet the global unique effect. The utility model has the advantages of simple implementation, simple maintenance and transparent application. The disadvantage is that the first time the setting is relatively complex.
Description: Avoid overlap requires a combination of various schemes
"Problems with UUID as primary key"
For an engine that innodb such a clustered primary key type, the data is sorted according to the primary key, and because of the unordered nature of the UUID, InnoDB creates a huge IO pressure, which is not appropriate to use the UUID as a physical primary key, and as a logical primary key, the physical primary key still uses the ID.
First, the InnoDB will physically sort the primary key, which is good news for Auto_increment_int, since the primary key position is always the last one inserted. But for the UUID, this is bad news because the UUID is messy, the primary key position is indeterminate each time it is inserted, it may be at the beginning, or it may be in the middle, and when the primary key is physically sorted, a large number of IO operations are bound to affect the efficiency.
The solution to this problem is not much, the more common way is that the primary key is still done with Auto_increment_int, and plus a UUID to do a unique index, table foreign Key Association what, but also with a UUID to do, that is to say Auto_increment_ An int is just a formal primary key, and the UUID is the de facto primary key, so that the INT primary key does not waste much space, and on the other hand, the UUID can continue to be used.
1, the simplest way
4 databases, the first MySQL primary key from 1 to start each time plus 4, the second from 2 to start each time plus 4, and so on.
2. Build Sequence Server
2.1, the use of n-MySQL as sequence server to prevent a single point of failure.
2.2, each table on each server represents a sequence, each table also has only one record (table-level lock, select MyISAM engine).
2.3. The first server sequence starts with N at 1, and the second one adds n each time from 2.
2.4. Get the Nextval from the first server and modify Nextval plus n if the first server gets failed, then get from the second server.
Modify MySQL "Default auto grow Step Size"
Set global auto_increment_increment=1; -Set the growth value of the sequence
Show global variables; -Show All global variables
Show global variables like '%test% '-query for global variables containing test string
how to generate a globally unique ID in a highly concurrent distributed system
Using the database self-increasing ID
Advantages: Simple coding, no need to consider the issue of record unique identification.
1 in the large table to do horizontal table, you can not use the ID, because insert records inserted into which the table according to the rules of the table, if the ID, the ID will be repeated in the table, in the query, delete will have an exception.
2 When the table is high concurrent single record inserts need to add things to the mechanism, otherwise there will be the problem of duplicate ID.
3 when the operation of the parent, child table (that is, the association table) inserts, you need to get Max (ID) to identify the parent and child table relationships before inserting the database, and if there is a concurrent fetch of MAX (ID), the max (ID) will be fetched by another thread.
4) and so on.
Conclusion: Suitable for small application, no table, no high concurrent performance requirements.
Open a separate database to get the globally unique MAXID sequence number or table
1 Use of self-added serial number table
A specialized database that generates serial numbers. To open things, each time the operation inserts, inserts the data into the sequence table and returns the self added serial number for the business data insert for the unique ID.
Note: You need to periodically clean up the data in the sequence table to ensure the efficiency of the serial number, and to open things when you insert a sequence table record.
The problem with this scenario is that each time the query serial number is a performance loss; If the serial number is out of order, then it's a cup, and you don't know which list to use which sequence, so you have to switch to another unique ID way, such as a GUID.
2 use MAXID table to store the MAXID value of each table
Specialized a database, records the MAXID value of each table, constructs a stored procedure to take the ID, the logic basically is: Opens the transaction, for the record does not exist in the table, returns directly a default value 1 The key value, inserts this record to the Table_key table. And for existing records, the key value directly on the original key based on 1 update to the MAXID table and return key.
The problem with this scenario is that each query Maxid is a performance loss, but it is not as easy to list as an automatic sequence table because it is divided by the pendulum table.
Detailed reference: "Use the Maxid table to store the MAXID value of each table to obtain a globally unique ID"
I intercepted the SQL syntax in this article as follows:
First step: Create a table
|1 2 3 4 5
||CREATE TABLE Table_key (table_name varchar notnullprimarykey, Key_value intnotnull)
Step two: Create a stored procedure to take from the increase ID