original works, reproduced please indicate: http://blog.csdn.net/Xiejingfa/article/details/51231967
This week is a lot of things, originally planned to write two articles each week the task seems to be finished. Today for everyone to bring an orderly set of Zset source analysis.
The Zset in Redis primarily supports the following commands:
- Zadd, Zincrby
- Zrem, Zremrangebyrank, Zremrangebyscore, Zremrangebyrank
- Zrange, Zrevrange, Zrangebyscore, Zrevrangebyscore, Zrangebylex, Zrevrangebylex
- Zcount, Zcard, Zscore, Zrank, Zrevrank
- Zunionstore, Zinterstore
The source code of Zset mainly involves two files of Redis.h and t_zset.c.
1. Jumping Table Skiplist
The Zset in Redis uses the data structure of the jumping table skiplist when implemented. Skiplist is a randomized data structure based on a parallel linked list, which was first proposed by William Pugh in the paper "Skip lists:a probabilistic alternative to balanced trees", You can implement insert, delete, and find operations with an average complexity of O (LONGN).
Here we mainly introduce the implementation and application of jumping table in Redis. On the principle of jumping table and code implementation, I found a good article on the Internet, if you do not know the jumping table related knowledge, you can first look at this article: skiplist jumping table detailed and its programming implementation.
1.1, the memory structure of the jumping table
The Jump table implementation in Redis and William Pugh's jump table described in the article "Skip lists:a probabilistic alternative to Balanced Trees" is basically the same, with the following three points modified:
- The jump table in Redis allows for a duplicate score score to support multiple elements in an ordered set that can have the same score score.
- A node's comparison operation not only compares its score score, but also compares its associated element values to value.
- Each node also has a back pointer (equivalent to the Prev pointer in a doubly linked list), which allows us to traverse the list from the end of the table to the table header. This property can be implemented zset some of the reverse operations of commands such as Zrevrange.
The node of the jumping table is defined in the Redis.h header file:
/* 跳跃表节点定义 */typedefstruct zskiplistNode { // 存放的元素值 robj *obj; // 节点分值,排序的依据 double score; // 后退指针 struct zskiplistNode *backward; // 层 struct zskiplistLevel { // 前进指针 struct zskiplistNode *forward; // 跨越的节点数量 unsignedint span; } level[];} zskiplistNode;
The jump table is defined as follows:
/* 跳跃表定义 */typedefstruct zskiplist { // 跳跃表的头结点和尾节点 struct zskiplistNode *header, *tail; // 节点数量 unsignedlong length; // 目前跳跃表的最大层数 int level;} zskiplist;
Skip table main operations implemented in T_ZSET.C, mainly include the following operations:
//Create a jump table node with a level of layerZskiplistnode *zslcreatenode (intLevelDoubleScore, RobJ *obj);//Create a skip tableZskiplist *zslcreate (void);//Releases the specified jump table nodevoidZslfreenode (Zskiplistnode *node);//Release Jump metervoidZslfree (Zskiplist *ZSL);//Insert a new node into the jump tableZskiplistnode *zslinsert (Zskiplist *ZSL,DoubleScore, RobJ *obj);//Delete node functions for Zsldelete, zsldeletebyscore, and Zsldeletebyrank function callsvoidZsldeletenode (zskiplist *ZSL, Zskiplistnode *x, Zskiplistnode **update);//Remove a score from the Jump table score, save the node with the Obj objectintZsldelete (Zskiplist *ZSL,DoubleScore, RobJ *obj);Returns 1 if range is included in the value range of the Jump table, otherwise 0intZslisinrange (zskiplist *ZSL, Zrangespec *range);//Returns the first score score the node in range specified in the Jump tableZskiplistnode *zslfirstinrange (zskiplist *ZSL, Zrangespec *range);//Returns the last score in the Jump table score node in range specifiedZskiplistnode *zsllastinrange (zskiplist *ZSL, Zrangespec *range);//Delete all points in the jump table for the nodes within a given range rangeunsigned LongZsldeleterangebyscore (zskiplist *ZSL, Zrangespec *range, Dict *dict);//Delete member object values in the specified dictionary-ordered range of nodesunsigned LongZsldeleterangebylex (zskiplist *ZSL, Zlexrangespec *range, Dict *dict);//Delete the node of the given sort range in the jump tableunsigned LongZsldeleterangebyrank (Zskiplist *ZSL,unsigned intStartunsigned intEnd, Dict *dict);//Returns the row position of the specified element in the jump tableunsigned LongZslgetrank (Zskiplist *ZSL,DoubleScore, RobJ *o);//Returns the node on the specified rankzskiplistnode* Zslgetelementbyrank (Zskiplist *ZSL,unsigned LongRank
Next, we'll give you two columns to get a better understanding of the storage structure of the jump table.
(1), create an empty jump table
The zslcreate is used to create and initialize a new jump table, and an empty jump table looks like this:
Where Level0-level31 is an array of zskiplistlevel structures with a length of 32 (defined by Zskiplist_maxlevel with a value of 32), the Zskiplistlevel struct contains span and forward two members, Span is ignored here for ease of display.
(2), insert operation
The elements in the jumping table are sorted by the score score, if we insert a, B, C, d four elements into the jump table, the corresponding score is 3, 5, 7, 9, the corresponding jump table structure is as follows:
2, Zset encoding method
There are two main coding methods for the ordered set Zset: Redis_encoding_ziplist and Redis_encoding_skiplist. Ziplist can represent smaller, ordered sets, and Skiplist represent an ordered set of arbitrary sizes.
Before we introduce the list data type, the list is ziplist as the default encoding. However, in Zset, a different strategy is taken, and Zset chooses to create the encoding based on the length of the first element added by the Zadd command. Specifically: Use ziplist encoding if one of the following two conditions is met:
- The value of server.zset_max_ziplist_entries in Redis is not 0.
- The length of the first element value is less than Server.zset_max_ziplist_value (the default value is 64).
Conversely, the skiplist encoding is used.
This procedure is implemented zaddGenericCommand
in the function, where only a portion of the code is used for display:
.../* Lookup The key and create the sorted setifdoes not exist. *///Remove the ordered set object zobj = Lookupkeywrite (C->db,key); If the ordinal collection object specified by key does not exist, create aif(Zobj = =NULLThe default value for {//server.zset_max_ziplist_entries is -The default value for Server.zset_max_ziplist_value is - if(Server.zset_max_ziplist_entries = =0|| Server.zset_max_ziplist_value < Sdslen (c->argv[3]->ptr)) {zobj = Createzsetobject (); }Else{zobj = Createzsetziplistobject (); } dbadd (C->db,key,zobj); }//If the object specified by key exists, you need to further check whether its type is ZsetElse{if(Zobj->type! = redis_zset) {addreply (C,shared.wrongtypeerr); Goto cleanup; } }...
If Zset is currently using redis_encoding_ziplist encoding, it is converted to redis_encoding_skiplist encoding when one of the following two conditions is met:
- When the new string to be added is longer than Server.zset_max_ziplist_value (the default value is 64).
- When the number of nodes saved in Ziplist exceeds server.zset_max_ziplist_entries (the default value is 128).
The conversion of two encodings is zsetConvert
implemented by a function.
2.1. Ziplist encoded ZSET storage structure
In Zset, each element contains two members: an element value, a score. What if you use Ziplist encoding to save these two members? Redis uses two adjacent nodes in the ziplist to hold an element in the Zset, and the two nodes hold the element value and the score respectively. To facilitate the description, we call these two "element value nodes" and "score nodes". At the same time, in order to maintain the order of Zset, the node in the Ziplist 221 groups and by the score score from small to large sort.
So the ziplist encoded ZSET storage structure is as follows:
2.2. Skiplist encoded ZSET storage structure
An ordered set of skiplist encodings is defined in the Redis.h header file:
/* 有序集合结构体 */typedefstruct zset { // 字典,维护元素值和分值的映射关系 dict *dict; // 按分值对元素值排序序,支持O(longN)数量级的查找操作 zskiplist *zsl;} zset;
You will not be surprised to see here: not to say skiplist code, why there will be dict in it? Here to explain the role of Dict. In the above introduction we said that the jump table can achieve an average complexity of O (LONGN) Insert, delete and find operations, which is the basis for zset efficient operation. However, Zset also needs to support such things as getting the value corresponding to the element values, judging whether an element value exists zset Medium command, for these operations, if the jump table based on the implementation of the effect is not good. So the author adds a dict to maintain the mapping of the element value and the score (the key is the element value, the value is the score), so it can get the score of the specified member quickly, and make up the deficiency of skiplist.
Here, we have learned about the storage structure of Zset two encoding methods. Similar to the list type we described earlier, one of the main functions of the zset correlation function is to maintain a uniform Zset operating interface on both Ziplist and skiplist to mask the underlying differences. These operations do not have any difficulty, here do not repeat, we can see the source of the comments provided behind me.
3. Range Operation Command
There are a lot of zset-related commands that can be summarized in the following three ways:
- Gets or deletes elements within the specified interval, such as the Zrange, Zrevrange, Zremrangebyrank commands.
- Gets or deletes elements within a specified score range, such as Zrangebyscore, Zrevrangebyscore, zremrangebyscore commands.
- Gets or deletes elements within the specified dictionary order interval, such as Zrangebylex, Zremrangebylex. In this case, it is important to note that only when the score score of all elements inserted into an ordered set (Sorted set) is the same, You can use the Zrangebylex or Zremrangebylex command to think that the elements stored in an ordered collection are sorted by dictionary order (lexicographical ordering), and then return or delete all elements of the element value between min and Max Max. If an element in an ordered collection has a different score, the returned or deleted element is indeterminate.
To facilitate the operation of the range, Redis has set the score interval structure and the dictionary interval structure in the redis.h header file:
/ * Struct to hold a inclusive/exclusive range spec by score comparison. * // * Indicates a structure with an interval or closed interval in a range * /typedefstruct{//min, Max Double min,Max;//Whether it contains the minimum value, whether it contains the maximum value (0 means contains, 1 is not included) intMinEx, Maxex;/ * is min or max exclusive? * /} Zrangespec;/* Struct to hold a inclusive/exclusive range spec by lexicographic comparison. * // * open interval or closed interval in dictionary order * /typedefstruct{RobJ *min, *Max;/* May is set to shared. (minstring|maxstring) */ //Whether it contains the minimum value, whether it contains the maximum value (0 means contains, 1 is not included) intMinEx, Maxex;/ * is min or max exclusive? * /} Zlexrangespec;
We take the sum zslDeleteRangeByScore
zzlDeleteRangeByScore
function as an example to summarize the general process of this kind of interval operation.
The Zsldeleterangebyscore function is used to delete elements in the specified range of values in an ordered collection of skiplist encodings, implemented as follows:
unsignedLong Zsldeleterangebyscore (zskiplist *ZSL, Zrangespec *range, Dict *dict) {//update array used to record down-level nodes Zskiplistnode *updat e[Zskiplist_maxlevel], *x; unsigned long removed =0; int i; x = zsl->header; From the previous traversal, record the drop-down node, and then modify the pointer for (i = zsl->level-1; I >=0; I--) {while (X->level[i].forward && Range->minex? X->level[i].forward->score <= Range->min:x->level[i].forward->score < range->min)) x = x->level[i].forward; Update[i] = x; } /* CurrentNode is the last with score < or <= min.//To the first node to be deleted in the first x = x->level[0].forward; /*DeleteNodes whileinchRange. *//delete all nodes within range specified while (x && (Range->maxex X->score < Range->max:x->scor E <= Range->max) {//record the location of the next node Zskiplistnode *next = x->level[0].forward; Delete node Zsldeletenode (zsl,x,update); Delete the corresponding element in Dict dictdelete (dict,x->obj); Zslfreenode (x); Record the number of deleted nodes removed++; Point to next node x = Next; } return removed;}
The Zzldeleterangebyscore function is used to delete elements in the specified range of values in an ordered collection of ziplist encodings, implemented as follows:
unsigned Char*zzldeleterangebyscore (unsigned Char*zl, Zrangespec *range,unsigned Long*deleted) {unsigned Char*eptr, *sptr;DoubleScore;unsigned Longnum =0;if(Deleted! =NULL) *deleted =0;//point to ziplist the first node in the specified rangeEptr = Zzlfirstinrange (Zl,range);if(Eptr = =NULL)returnZl/ * When the tail of the ziplist is deleted, Eptr would point to the Sentinel * byte and Ziplistnext would return NU LL. * * //Delete node always encounters a node that is not within range specified while((Sptr = Ziplistnext (zl,eptr))! =NULL) {score = Zzlgetscore (sptr);if(Zslvalueltemax (Score,range)) {/ * Delete Both the element and the score. *ZL = Ziplistdelete (Zl,&eptr); ZL = Ziplistdelete (Zl,&eptr); num++; }Else{/ * No longer in range. * / Break; } }if(Deleted! =NULL) *deleted = num;returnZL;}
As we can see, the general process for Zset range operations is:
- Find the first node in the Ziplist or skiplist that falls within the specified interval and begins the iteration. For example, the value of the specified score interval is [3,10], if it is a positive operation, the first step to find the first value is greater than or equal to 3 of the node, the next traversal; if it is the reverse operation, the first step to find out the first point is less than or equal to 10 of the node, then traverse forward.
- In the previous step, you get started traversing the node, and then you start from that node in a forward or reverse direction until you encounter a node that does not meet the requirements.
Zset of the implementation of the analysis is probably so much, about the details of the things we still need to look at the source code, this way better grasp.
Annotated version Source: https://github.com/xiejingfa/the-annotated-redis-2.8.24/blob/master/t_zset.c
"Redis Source Anatomy"-an ordered collection of Redis data types Zset