Objective
Uploading more than 25 images per second and 90 "likes", we've saved a lot of data on Instagram, to make sure that the important data is thrown into memory, to respond quickly to user requests, we've started to shard the data-in other words, put the data in more buckets, each with a portion of the data.
Our application server ran the Django and backend is PostgreSQL, after deciding to shard the first question is whether to continue to use PostgreSQL as the main data warehouse, or other? We evaluated some nosql solutions, but ultimately the best solution was to fragment the data into different PostgreSQL databases.
Before writing data to different servers, you need to solve a problem, how to give each piece of data in the database to identify the unique identity (for example, published to our system of each picture). The library solves the problem by using the self-increment primary key-but if the data is written to multiple libraries at the same time, this blog will answer if this issue is resolved.
Before you begin, list the main implementation goals of your system:
The generated ID can be sorted by time (e.g., the ID of a picture list, can be sorted directly without having to get more information)
The ID is preferably 64 bits (so that the index is smaller and the storage is better, like Redis)
The best part of the system is "variable"-a large part of the reason why, in the case of very few engineers can be extended Instagram, just because we believe in simple and easy to use!
Existing Solutions
Many similar ID solutions have some problems, and here are a few examples:
Generate IDs on the Web application tier
These methods throw the task of generating ID into the application layer implementation, not the database layer. For example, MongoDB's ObjectId, is a 12-byte long encoded timestamp as the first part, another popular method is to use UUIDs.
Advantages:
The IDs generated by each app service are independent, generating failures and minimizing competition;
If you use timestamps as the first part, you can sort by time
Disadvantage:
Requires more storage space (96 or more) to be unique;
Some of the UUID types are completely random numbers, with no sorting characteristics;
Generated by a separate service provider ID
Such as: Twitter's snowflake, is a thrift service used by Apache zookeeper to coordinate each node and generate a unique 64-bit ID.
Advantage:
The ID generated by snowflake is 64 bits, only half the size of the UUID;
Can arrange the time to the front, can be sorted;
The distributed system can guarantee that the service will not be hung out;
Disadvantage:
The system will become more complex and more "variable factor" ( ZooKeeper, Snowflake service ) added to our architecture.
Database Count Server
The ability to use the database to self-increment the field to ensure uniqueness (Flickr uses this method), but with two count servers (one is generated odd, the other is an even number) to avoid a single point of failure.
Advantage:
Database good understanding, extension is easy to predict the factors to be considered;
Disadvantage:
It may eventually become a bottleneck (although Flickr has reported this, it is not a problem in high scale);
Two new servers to manage (or EC2 instances);
If using a single database, there will be a single point of failure, if the use of multiple libraries, there is no guarantee that they can be sorted by time;
Of all the above methods, Twitter's snowflake is the closest, but the addition of the build ID service for complex calls is conflicting, and the alternative is that we use a similar concept, but from the PostgreSQL intrinsic features.
Our Solutions
Our shard system consists of thousands of logical shards , which point to very few physical shards , which we can implement with a few servers, and then expand to more, As long as we simply move the logical shards from one physical data device to another, and do not need to re-aggregate the data of each shard, we can easily implement and manage it with PostgreSQL's schema feature.
Schema (do not confuse SQL schema with a single table) in PostgreSQL is a logical grouping function, each PostgreSQL has multiple schemas, each schema can contain one or more tables, the table name in each schema is unique, Not every library, PostgreSQL defaults to putting everything in a schema called public.
Each logical shard in our system is a schema, and the table for each shard (for example, the "like" feature of a photo) exists in each schema.
We use the Pl/pgsql (PostgreSQL internal programming language) and the auto-increment feature to create IDs in each table of each shard.
Each ID contains:
41-bit millisecond time (41-year ID can be used);
A 15-bit representation of the logical ID;
The 10-bit self-increment sequence, with 1024 modulo, means that each shard can generate 1024 IDs per millisecond;
Look at an example:
Assuming it is September 9, 2011 5:00, the era of the system begins with September 1, 2011, from the epoch to now it has been 1387263000 milliseconds, to generate the ID, the left shift method fills the leftmost 41-bit value:
id = 1387263000 << (64-41)
Next, what if the ID of the Shard to insert the data is generated? Suppose we use a user ID to Shard, and already have 2000 logical shards, if the user ID is 31341, then the Shard ID is 31341, 1341, and this value also fills the next 13 bits:
ID |= 1341 << (64-41-13)
Finally, to generate the last self-increment sequence value (this sequence is unique for each table in each schema) and fill the remaining few, assuming that the table has generated 5,000 IDs, the next value is 5001, with 1024 modulo (just 10 bits), add in:
ID |= (5001 1024)
ID was generated! Use returning to return to the application layer for insert.
The following is the complete Pl/pgsql code (the schema in the example is INSTA5):
<code>create OR REPLACE FUNCTION insta5.next_id (out result bigint) as $DECLARE our_epoch bigint: = 1314220021721; seq_id bigint; Now_millis bigint; shard_id int: = 5; BEGIN SELECT nextval (' Insta5.table_id_seq ') with a percent of seq_id; SELECT Floor (EXTRACT, EPOCH from Clock_timestamp ()) * +) into Now_millis; Result: = (Now_millis-our_epoch) << 23; Result: = Result | (shard_id << 10); Result: = Result | (seq_id); end;$ LANGUAGE plpgsql;</code>
Create the table with the following code:
<code>create table insta5.our_table ("id" bigint not NULL DEFAULT insta5.next_id (), ... rest of the table schema: .) </code>
That's all! The primary key is unique across all application tiers (another benefit is that it's easy to map with the Shard ID), which we've already used in production, and the results are so far satisfying, if you can help us with the extension problem, we're hiring!
Mike Krieger, co-founder.
English original Http://instagram-engineering.tumblr.com/post/10853187575/sharding-ids-at-instagram
< translator: Zhu Yu [email protected] 2015.7.29>
Sharding and ID design for Instagram architecture