ID Generator In Weibo we have been called the number of the device, Weibo is to use such a number to store, and I borrie discussion of the time is also used in the hair number of the label. Its main purpose is, as is commonly understood, "generates a unique identifier for a distributed system's data object ", but it can also take on more roles in a real system. The following points are summed up:
1.Uniqueness2. Time-related
3. Rough and orderly4. Can reverse the solution5. Can be manufactured
I will discuss the considerations and tradeoffs behind each of these roles separately, and will also compare some of the industry-known ID designs.
1. Is it necessary to be unique and globally unique?
When it comes to global uniqueness, it's common for everyone to think about the service of the generator, which usually requires more space, and the central one needs to converge in a suitable place. This may involve locking, which means a decrease in cost and performance. Therefore, whether the current system needs global uniqueness is an issue that needs to be considered.
For example, in a communication system, chat messages may not need to be global, because a message is only one person issued, the system as long as the uniqueness of a person's dimension can be guaranteed. Essentially, the uniqueness of the user ID is exploited, because uniqueness can be relied upon, and often our design systems are based on similar properties, such as the use of time-unique methods that fall behind.
2. What do you do with your time? Thousands of years too long, seize?
Before we can rely on uniqueness, we need to choose what to rely on. The usual practice is to choose Database self-increment, which in many databases can satisfy acid operations. But the database has a disadvantage, is that the database has a performance problem, in the case of multi-engine room is also difficult to deal with. Of course, you can design by adjusting the increment step, but for a generator, the operation and maintenance are slightly heavier.
And time is the only natural, so is also a lot of design choices. But for a 8Byte ID, the time is not that much. If you are accurate to the second level, 30 years to use 30bit, to the millisecond to increase the 10bit, you are only 20bit can do other things. The reason for tinkering on the 8Byte, because 8Byte is a long, can be better handled, both at the processor and compiler and language level.
But 30 years is enough? It may not be enough for a person, but it may be enough for a system. We often joke that the Internet can live 30 years of the system how many? 30 years in the past, your system may have been rewritten N times. This confidence is also from Moore's Law, 30 years later, the computational performance has been raised thousands of thousand, then more byte will not be a problem.
3. How rough is it, seconds or milliseconds?
One per second or one per millisecond ID is obviously not enough, just said there are 20bit can do other things, including a sequenceid. If you want to achieve a precise order, you need to control the Sequence concurrency, performance will definitely be discounted. So often one option is to no longer guarantee the order at this second level, while the entire ID is only guaranteed to be ordered in time. The ID of the second is certainly greater than the previous second, but the same second may be taken after the ID is smaller than the previous number. This is critical when you use it, and you have to understand that the system has to be accepted.
Is it in seconds or milliseconds? In fact, no milliseconds can be empty out of the 10bit to Sequence, but the accuracy of the entire ID is reduced. The peak speed is a more realistic consideration. The Sequence space determines the peak speed, and the peak value means that the duration is not too long. In this respect, 1 million per second is smaller than the 1000 limit per millisecond.
4. What can be solved by reverse?
After an ID is generated, it will be accompanied by information for life, when troubleshooting analysis, we need to check. At this time a counter-solution ID can help a lot of busy, from where, when born. With the identity card is a bit connected, in fact, the identity card is a typical distributed ID generator.
If the ID already has time and can be solved, the timestamp class of fields may no longer be required at the storage level. The ID of Weibo has a lot of business information, which will be said later.
5. Can be manufactured, why not UUID?
Availability on an Internet system is always a priority indicator. However, due to the fragility of distributed system, unstable network or the unavailability of the underlying storage system, the business system is faced with failure at any time. To be more responsive to the front end, we need to tolerate failure as much as possible. For example, in the case of storage failure, you may need to temporarily export the request for subsequent processing, and the subsequent processing has left the point in time, the order is staggered with other systems. We need to create this ID so that the system seems to be running normally, and the manufacturing ID allows you to control the production date (sweat, a bit fake meaning), and then proceed with the following processing.
Another important scenario is data cleansing. This is less encountered, but not uncommon situation, may be the original ID design unreasonable, may also be due to changes in the underlying storage, may appear. Such a manufacturing ID can bring a lot of operational convenience.
That's one reason we don't use the UUID. The UUID standard can be guaranteed to spawn at some point, but if you want to control the creation of a specific time uuid, you may need to change the underlying library. Experience tells us that the problem can be solved at the top of the lower level, the maintenance cost of such a library is very high.
#设计细节
The UUID does not say, the other public out here said Snowflake, Weibo and Ticktick design.
1. Snowflake
41bit left millisecond time, 10bit to MachineID, that is, the machine to pre-configuration, leaving 12 left to sequence. Although the code is exposed, it is not available, it is said to be in the internal transformation.
2. Weibo
Weibo used a second level of time, using the 30bit,sequence 15-bit, in theory can be done 3.2w/s speed. Using 4bit to differentiate IDC, which can support 16 IDC, is enough for the core room. The remaining 2bit is used to differentiate the business, because the current service is the center of the machine room, 1bit to differentiate hot standby. Yes, it's not full 64bit.
3. Ticktick
Which is what is currently used in the ring system. Use 30bit seconds, 20bit to sequence. Here is a consideration, the first version of the implementation or hope to the millisecond level, so the 20bit of the first 10bit to the millisecond to use, leaving 10bit to Sequence. Wait until the peak increases to temporarily return to the second level.
The previous 30-year problem, so I left a high in 2bit to do Version, or the time to change the use of longer bytes, with the first to identify different IDs, or you can move this 2bit to use, you can give the system to set aside some time.
The remaining 10bit is left to MachineID, which means that the current ID generation can be directly embedded in the business service, supporting up to thousand of servers. Finally, there are 2bit tags, which may differentiate between group messages and single chat messages. At the same time you can see that this ID supports up to 1 billion messages a day, but also fear that the system is growing too fast, this 2bit can be moved to Sequence, can support 4 billion levels of message volume, or combined with the previous version support to company aims Yangzhou.
#后记
It is very simple to implement a generator, so it is not important to ticktick how to achieve it. But now, I still have the demo source, seeHttps://github.com/ericliang/ticktick
What is the global unique ID required for the business system? #Ticktick # (Ring letter chief architect: a happy)