Redis source code analysis (6) --- ziplist compression list

Source: Internet
Author: User
Tags 0xc0

Ziplist looks similar to the adlist name I previously resolved, but it works completely differently. The previous adlist mainly targets common data linked list operations. Today's ziplist refers to a compressed linked list. Why is it called a compressed linked list? Because pre is usually used in linked lists, next indicates the previous pointer of the current node or the next pointer of the current node. This actually occupies a large amount of memory space to a certain extent, ziplist uses the length representation method. The whole ziplist is actually a super long string. Through the length of each node in it, the length of the previous node, and other information, you can quickly locate and implement related operations, in addition, the writer also dynamically allocates bytes in length to indicate the length, avoiding certain memory consumption. For example, each character string of a node is short, however, when you use several bytes to represent the length of a string, this obviously results in a lot of waste. Therefore, ziplist achieves compression in terms of length representation, which also reflects the compression performance. Where is ziplist used? ziplist is the method used to add data to the Linked List, such as rpush and lpush, which are commonly used. Then we will see the corresponding implementation method.

At the beginning of learning ziplist, we must understand its structure. It takes some time to think about it. Otherwise, it is not easy to understand the design of the White House. Below is my understanding to help you understand:

/* The ziplist is a specially encoded dually linked list that is designed * to be very memory efficient. It stores both strings and integer values, * where integers are encoded as actual integers instead of a series of * characters. It allows push and pop operations on either side of the list * in O(1) time. However, because every operation requires a reallocation of * the memory used by the ziplist, the actual complexity is related to the * amount of memory used by the ziplist. * * ziplist是一个编码后的列表,特殊的设计使得内存操作非常有效率,此列表可以同时存放 * 字符串和整数类型,列表可以在头尾各边支持推加和弹出操作在O(1)常量时间,但是,因为每次 * 操作设计到内存的重新分配释放,所以加大了操作的复杂性 * ---------------------------------------------------------------------------- * * ziplist的结构组成: * ZIPLIST OVERALL LAYOUT: * The general layout of the ziplist is as follows: * <zlbytes><zltail><zllen><entry><entry><zlend> * * <zlbytes> is an unsigned integer to hold the number of bytes that the * ziplist occupies. This value needs to be stored to be able to resize the * entire structure without the need to traverse it first. * <zipbytes>代表着ziplist占有的字节数,这方便当重新调整大小的时候不需要重新从头遍历 *  * <zltail> is the offset to the last entry in the list. This allows a pop * operation on the far side of the list without the need for full traversal. * <zltail>记录了最后一个entry的位置在列表中,可以方便快速在列表末尾弹出操作 * * <zllen> is the number of entries.When this value is larger than 2**16-2, * we need to traverse the entire list to know how many items it holds. * <zllen>记录的是ziplist里面entry数据结点的总数 * * <zlend> is a single byte special value, equal to 255, which indicates the * end of the list. * <zlend>代表的是结束标识别,用单字节表示,值是255,就是11111111 * * ZIPLIST ENTRIES: * Every entry in the ziplist is prefixed by a header that contains two pieces * of information. First, the length of the previous entry is stored to be * able to traverse the list from back to front. Second, the encoding with an * optional string length of the entry itself is stored. * 每个entry数据结点主要包含2部分信息,第一个,上一个结点的长度,主要就可以可以从任意结点从后往前遍历整个列表 * 第二个,编码字符串的方式的类型保存 * * The length of the previous entry is encoded in the following way: * If this length is smaller than 254 bytes, it will only consume a single * byte that takes the length as value. When the length is greater than or * equal to 254, it will consume 5 bytes. The first byte is set to 254 to * indicate a larger value is following. The remaining 4 bytes take the * length of the previous entry as value. * 之前的数据结点的字符串长度的长度少于254个字节,他将消耗单个字节,一个字节8位,最大可表示长度为2的8次方 * 当字符串的长度大于254个字节,则用5个字节表示,第一个字节被设置成254,其余的4个字节占据的长度为之前的数据结点的长度 * * The other header field of the entry itself depends on the contents of the * entry. When the entry is a string, the first 2 bits of this header will hold * the type of encoding used to store the length of the string, followed by the * actual length of the string. When the entry is an integer the first 2 bits * are both set to 1. The following 2 bits are used to specify what kind of * integer will be stored after this header. An overview of the different * types and encodings is as follows: * 头部信息中的另一个值记录着编码的方式,当编码的是字符串,头部的前2位为00,01,10共3种 * 如果编码的是整型数字的时候,则头部的前2位为11,代表的是整数编码,后面2位代表什么类型整型值将会在头部后面被编码 * 00-int16_t, 01-int32_t, 10-int64_t, 11-24 bit signed,还有比较特殊的2个,11111110-8 bit signed, * 1111 0000 - 1111 1101,代表的是整型值0-12,头尾都已经存在,都不能使用,与传统的通过固定的指针表示长度,这么做的好处实现 * 可以更合理的分配内存 * * String字符串编码的3种形式 * |00pppppp| - 1 byte *      String value with length less than or equal to 63 bytes (6 bits). * |01pppppp|qqqqqqqq| - 2 bytes *      String value with length less than or equal to 16383 bytes (14 bits). * |10______|qqqqqqqq|rrrrrrrr|ssssssss|tttttttt| - 5 bytes *      String value with length greater than or equal to 16384 bytes. * |11000000| - 1 byte *      Integer encoded as int16_t (2 bytes). * |11010000| - 1 byte *      Integer encoded as int32_t (4 bytes). * |11100000| - 1 byte *      Integer encoded as int64_t (8 bytes). * |11110000| - 1 byte *      Integer encoded as 24 bit signed (3 bytes). * |11111110| - 1 byte *      Integer encoded as 8 bit signed (1 byte). * |1111xxxx| - (with xxxx between 0000 and 1101) immediate 4 bit integer. *      Unsigned integer from 0 to 12. The encoded value is actually from *      1 to 13 because 0000 and 1111 can not be used, so 1 should be *      subtracted from the encoded 4 bit value to obtain the right value. * |11111111| - End of ziplist. * * All the integers are represented in little endian byte order. * * ----------------------------------------------------------------------------

I hope you can read it carefully and understand the author's design ideas. The following describes the definition of his actual struct:

/* 实际存放数据的数据结点 */typedef struct zlentry {//prevrawlen为上一个数据结点的长度,prevrawlensize为记录该长度数值所需要的字节数    unsigned int prevrawlensize, prevrawlen;    //len为当前数据结点的长度,lensize表示表示当前长度表示所需的字节数    unsigned int lensize, len;    //数据结点的头部信息长度的字节数    unsigned int headersize;    //编码的方式    unsigned char encoding;    //数据结点的数据(已包含头部等信息),以字符串形式保存    unsigned char *p;} zlentry;/* <zlentry>的结构图线表示 <pre_node_len>(上一结点的长度信息)<node_encode>(本结点的编码方式和编码数据的长度信息)<node>(本结点的编码数据) */

Let's take a look at the core operations and insert operations, which involve various back-and-forth movements of pointers. These are adjustments to the memory address:

/* Insert item at "p". *//* 插入操作的实现 */static unsigned char *__ziplistInsert(unsigned char *zl, unsigned char *p, unsigned char *s, unsigned int slen) {    size_t curlen = intrev32ifbe(ZIPLIST_BYTES(zl)), reqlen;    unsigned int prevlensize, prevlen = 0;    size_t offset;    int nextdiff = 0;    unsigned char encoding = 0;    long long value = 123456789; /* initialized to avoid warning. Using a value                                    that is easy to see if for some reason                                    we use it uninitialized. */    zlentry tail;    /* Find out prevlen for the entry that is inserted. */    //寻找插入的位置    if (p[0] != ZIP_END) {    //定位到指定位置        ZIP_DECODE_PREVLEN(p, prevlensize, prevlen);    } else {    //如果插入的位置是尾结点,直接定位到尾结点,看第一个字节的就可以判断        unsigned char *ptail = ZIPLIST_ENTRY_TAIL(zl);        if (ptail[0] != ZIP_END) {            prevlen = zipRawEntryLength(ptail);        }    }    /* See if the entry can be encoded */    if (zipTryEncoding(s,slen,&value,&encoding)) {        /* 'encoding' is set to the appropriate integer encoding */        reqlen = zipIntSize(encoding);    } else {        /* 'encoding' is untouched, however zipEncodeLength will use the         * string length to figure out how to encode it. */        reqlen = slen;    }    /* We need space for both the length of the previous entry and     * the length of the payload. */    reqlen += zipPrevEncodeLength(NULL,prevlen);    reqlen += zipEncodeLength(NULL,encoding,slen);    /* When the insert position is not equal to the tail, we need to     * make sure that the next entry can hold this entry's length in     * its prevlen field. */    nextdiff = (p[0] != ZIP_END) ? zipPrevLenByteDiff(p,reqlen) : 0;    /* Store offset because a realloc may change the address of zl. */    //调整大小,为新结点的插入预留空间    offset = p-zl;    zl = ziplistResize(zl,curlen+reqlen+nextdiff);    p = zl+offset;    /* Apply memory move when necessary and update tail offset. */    if (p[0] != ZIP_END) {        /* Subtract one because of the ZIP_END bytes */        //如果插入的位置不是尾结点,则挪动位置        memmove(p+reqlen,p-nextdiff,curlen-offset-1+nextdiff);        /* Encode this entry's raw length in the next entry. */        zipPrevEncodeLength(p+reqlen,reqlen);        /* Update offset for tail */        ZIPLIST_TAIL_OFFSET(zl) =            intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))+reqlen);        /* When the tail contains more than one entry, we need to take         * "nextdiff" in account as well. Otherwise, a change in the         * size of prevlen doesn't have an effect on the *tail* offset. */        tail = zipEntry(p+reqlen);        if (p[reqlen+tail.headersize+tail.len] != ZIP_END) {            ZIPLIST_TAIL_OFFSET(zl) =                intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))+nextdiff);        }    } else {    //如果是尾结点,直接设置新尾结点        /* This element will be the new tail. */        ZIPLIST_TAIL_OFFSET(zl) = intrev32ifbe(p-zl);    }    /* When nextdiff != 0, the raw length of the next entry has changed, so     * we need to cascade the update throughout the ziplist */    if (nextdiff != 0) {        offset = p-zl;        zl = __ziplistCascadeUpdate(zl,p+reqlen);        p = zl+offset;    }    /* Write the entry */    //写入新的数据结点信息    p += zipPrevEncodeLength(p,prevlen);    p += zipEncodeLength(p,encoding,slen);    if (ZIP_IS_STR(encoding)) {        memcpy(p,s,slen);    } else {        zipSaveInteger(p,value,encoding);    }        //更新列表的长度加1    ZIPLIST_INCR_LENGTH(zl,1);    return zl;}

The delete operation is as follows:

/* Delete "num" entries, starting at "p". Returns pointer to the ziplist. *//* 删除方法涉及p指针的滑动,后面的地址内容都需要滑动 */static unsigned char *__ziplistDelete(unsigned char *zl, unsigned char *p, unsigned int num) {    unsigned int i, totlen, deleted = 0;    size_t offset;    int nextdiff = 0;    zlentry first, tail;    first = zipEntry(p);    for (i = 0; p[0] != ZIP_END && i < num; i++) {        p += zipRawEntryLength(p);        deleted++;    }    totlen = p-first.p;    if (totlen > 0) {        if (p[0] != ZIP_END) {            /* Storing `prevrawlen` in this entry may increase or decrease the             * number of bytes required compare to the current `prevrawlen`.             * There always is room to store this, because it was previously             * stored by an entry that is now being deleted. */            nextdiff = zipPrevLenByteDiff(p,first.prevrawlen);            p -= nextdiff;            zipPrevEncodeLength(p,first.prevrawlen);            /* Update offset for tail */            ZIPLIST_TAIL_OFFSET(zl) =                intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))-totlen);            /* When the tail contains more than one entry, we need to take             * "nextdiff" in account as well. Otherwise, a change in the             * size of prevlen doesn't have an effect on the *tail* offset. */            tail = zipEntry(p);            if (p[tail.headersize+tail.len] != ZIP_END) {                ZIPLIST_TAIL_OFFSET(zl) =                   intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))+nextdiff);            }            /* Move tail to the front of the ziplist */            memmove(first.p,p,                intrev32ifbe(ZIPLIST_BYTES(zl))-(p-zl)-1);        } else {            /* The entire tail was deleted. No need to move memory. */            ZIPLIST_TAIL_OFFSET(zl) =                intrev32ifbe((first.p-zl)-first.prevrawlen);        }        /* Resize and update length */        //调整列表大小        offset = first.p-zl;        zl = ziplistResize(zl, intrev32ifbe(ZIPLIST_BYTES(zl))-totlen+nextdiff);        ZIPLIST_INCR_LENGTH(zl,-deleted);        p = zl+offset;        /* When nextdiff != 0, the raw length of the next entry has changed, so         * we need to cascade the update throughout the ziplist */        if (nextdiff != 0)            zl = __ziplistCascadeUpdate(zl,p);    }    return zl;}

This method is used to delete num nodes starting from the node corresponding to the index. This is the most primitive method to delete, and other methods are packaged for this method.

Next, let's take a look at the methods we entered in the redis command line to call lpush or rpush? Call method:

zl = ziplistPush(zl, (unsigned char*)"foo", 3, ZIPLIST_TAIL);    zl = ziplistPush(zl, (unsigned char*)"quux", 4, ZIPLIST_TAIL);    zl = ziplistPush(zl, (unsigned char*)"hello", 5, ZIPLIST_HEAD);

/* 在列表2边插入数据的方法 */unsigned char *ziplistPush(unsigned char *zl, unsigned char *s, unsigned int slen, int where) {    unsigned char *p;    //这里开始直接定位    p = (where == ZIPLIST_HEAD) ? ZIPLIST_ENTRY_HEAD(zl) : ZIPLIST_ENTRY_END(zl);    //组后调用插入数据的insert方法    return __ziplistInsert(zl,p,s,slen);}

Finally, the insert method is called. I have read some ziplist analyses analyzed by others before writing them. I feel that some of them are very rough. I think it is clear that I have read the source code carefully. Each person has a different focus. Finally, the header file and key macro definitions are given:
/* zip列表的末尾值 */#define ZIP_END 255/* zip列表的最大长度 */#define ZIP_BIGLEN 254/* Different encoding/length possibilities *//* 不同的编码 */#define ZIP_STR_MASK 0xc0#define ZIP_INT_MASK 0x30#define ZIP_STR_06B (0 << 6)#define ZIP_STR_14B (1 << 6)#define ZIP_STR_32B (2 << 6)#define ZIP_INT_16B (0xc0 | 0<<4)#define ZIP_INT_32B (0xc0 | 1<<4)#define ZIP_INT_64B (0xc0 | 2<<4)#define ZIP_INT_24B (0xc0 | 3<<4)#define ZIP_INT_8B 0xfe/* 4 bit integer immediate encoding */#define ZIP_INT_IMM_MASK 0x0f    //后续的好多运算都需要与掩码进行位运算#define ZIP_INT_IMM_MIN 0xf1    /* 11110001 */#define ZIP_INT_IMM_MAX 0xfd    /* 11111101 */   //最大值不能为11111111,这跟最末尾的结点重复了#define ZIP_INT_IMM_VAL(v) (v & ZIP_INT_IMM_MASK)#define INT24_MAX 0x7fffff#define INT24_MIN (-INT24_MAX - 1)/* Macro to determine type */#define ZIP_IS_STR(enc) (((enc) & ZIP_STR_MASK) < ZIP_STR_MASK)/* Utility macros *//* 下面是一些用来到时能够直接定位的数值偏移量 */#define ZIPLIST_BYTES(zl)       (*((uint32_t*)(zl)))#define ZIPLIST_TAIL_OFFSET(zl) (*((uint32_t*)((zl)+sizeof(uint32_t))))#define ZIPLIST_LENGTH(zl)      (*((uint16_t*)((zl)+sizeof(uint32_t)*2)))#define ZIPLIST_HEADER_SIZE     (sizeof(uint32_t)*2+sizeof(uint16_t))#define ZIPLIST_ENTRY_HEAD(zl)  ((zl)+ZIPLIST_HEADER_SIZE)#define ZIPLIST_ENTRY_TAIL(zl)  ((zl)+intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl)))#define ZIPLIST_ENTRY_END(zl)   ((zl)+intrev32ifbe(ZIPLIST_BYTES(zl))-1)
. H file:

/* * Copyright (c) 2009-2012, Pieter Noordhuis <pcnoordhuis at gmail dot com> * Copyright (c) 2009-2012, Salvatore Sanfilippo <antirez at gmail dot com> * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are met: * *   * Redistributions of source code must retain the above copyright notice, *     this list of conditions and the following disclaimer. *   * Redistributions in binary form must reproduce the above copyright *     notice, this list of conditions and the following disclaimer in the *     documentation and/or other materials provided with the distribution. *   * Neither the name of Redis nor the names of its contributors may be used *     to endorse or promote products derived from this software without *     specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE * POSSIBILITY OF SUCH DAMAGE. *//* 标记列表头节点和尾结点的标识 */#define ZIPLIST_HEAD 0#define ZIPLIST_TAIL 1unsigned char *ziplistNew(void);    //创建新列表unsigned char *ziplistPush(unsigned char *zl, unsigned char *s, unsigned int slen, int where);  //像列表中推入数据unsigned char *ziplistIndex(unsigned char *zl, int index);   //索引定位到列表的某个位置unsigned char *ziplistNext(unsigned char *zl, unsigned char *p);   //获取当前列表位置的下一个值unsigned char *ziplistPrev(unsigned char *zl, unsigned char *p);   //获取当期列表位置的前一个值unsigned int ziplistGet(unsigned char *p, unsigned char **sval, unsigned int *slen, long long *lval);   //获取列表的信息unsigned char *ziplistInsert(unsigned char *zl, unsigned char *p, unsigned char *s, unsigned int slen); //向列表中插入数据unsigned char *ziplistDelete(unsigned char *zl, unsigned char **p); //列表中删除某个结点unsigned char *ziplistDeleteRange(unsigned char *zl, unsigned int index, unsigned int num);   //从index索引对应的结点开始算起,删除num个结点unsigned int ziplistCompare(unsigned char *p, unsigned char *s, unsigned int slen);   //列表间的比较方法unsigned char *ziplistFind(unsigned char *p, unsigned char *vstr, unsigned int vlen, unsigned int skip); //在列表中寻找某个结点unsigned int ziplistLen(unsigned char *zl);   //返回列表的长度size_t ziplistBlobLen(unsigned char *zl);   //返回列表的二进制长度,返回的是字节数


Redis source code analysis (6) --- ziplist compression list

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.