Redis Source Analysis (7)--rdb

Source: Internet
Author: User
Tags goto rehash redis server

An RDB is another form of persistence for Redis, which is equivalent to a timed snapshot and also to the snapshot +redo log in master-slave synchronization. Redis does not require a lock when it makes an RDB, which is done by using the parent-child process to share the same memory. After the parent process fork the child process, the parent and child share the same physical memory copy-on-write, and the memory is copied according to the memory page when two processes write memory. This ensures that at the time of the RDB, the worst case scenario is to ensure that twice times the memory space is used for the parent-child process (Redis consumes 1G, so the system has 2G of memory, otherwise the use of swap may occur). Because of the copy-on-write, unnecessary memory copies need to be avoided. The child process basically only needs to read memory, and the parent process responds to the client request, need to modify memory, in order to reduce memory modification, the parent process pauses keyspace corresponding hash table rehash (rehash will have a large number of copies, to copy data between different buckets). Take a look at the RDB-related content below.

1. rdb file format

The Rdb file format is relatively simple and can be seen as a sequence of instructions, consisting of each instruction:

|-----------------------|----------------------------------|

|       OP Code:1byte | Instruction:nbytes |

|-----------------------|----------------------------------|

When an RDB is loaded, this sequence of instructions is parsed. All op Code includes:

Redis_rdb_opcode_expiretime_ms: MS-level Expiration time

redis_rdb_opcode_expiretime: The second-level expiration time

redis_rdb_opcode_selectdb: for select DB Command

redis_rdb_opcode_eof: End of RDB file

OP code also includes all data types (redis_rdb_type_list, redis_rdb_type_set, etc.) that specify the type of value in subsequent kv pairs.

The first 5 bytes of the Rdb file are magic number, which is used to indicate that the file is an RDB file. The next 4 bytes are the version number, and when the RDB is loaded, it is compared to the version number of the RDB and the version number of Redis to see if the version of the RDB can be processed. Then, it is a sequence of instructions, ending with EOF.

2. RDB Dump

First look at the timing of the dump, mainly divided into 3 pieces:

1) Save command: The client sends the Save command, and the Redis instance blocks execution dump. In the Savecommand function.

2) Bgsave command: The dump task is completed by the child process and the master process can continue the service request. In the Bgsavecommand function.

3) Passive trigger: The number of redis changes or dump interval exceeds the threshold value. In Servercron, detection and triggering.

4) Master-slave sync trigger: When partial sync is not possible, master needs to transfer the RDB to slave. In the Synccommand function.

Here's a look at the specific process of the RDB dump, which is done by the Rdbsave function.

    <MM>    //Create and open temporary RDB file    //</MM>    snprintf (tmpfile,256, "Temp-%d.rdb", (int) getpid ());    fp = fopen (Tmpfile, "w");    if (!FP) {        redislog (redis_warning, "Failed opening. Rdb for saving:%s",            strerror (errno));        return redis_err;    }    Rioinitwithfile (&RDB,FP);    if (server.rdb_checksum)        rdb.update_cksum = riogenericupdatechecksum;
Creates and opens a temporary file, which is to ensure the data integrity of the RDB and replaces the original file only after the dump is successful. Then it initializes the Rio, which is used for the output.

    <MM>    //write Magic Number,format:    //9bit:redis[rdb_version]    //</MM>    snprintf (Magic, sizeof (Magic), "redis%04d", redis_rdb_version);    if (Rdbwriteraw (&rdb,magic,9) = =-1) goto Werr;
Writes the magic number and version numbers.

Next is a loop that iterates over each Redis db and generates the corresponding content.

    for (j = 0; J < Server.dbnum; J + +) {        //Dump the DB    }
Looking at the dump process for each db is actually traversing and outputting each key-value pair.

        Redisdb *db = server.db+j;        Dict *d = db->dict;        if (dictsize (d) = = 0) continue;        DI = Dictgetsafeiterator (d);        if (!di) {            fclose (FP);            return redis_err;        }
First determine if the db is empty, and if it is empty, skip. Then get the iterator for the DB.

        /* Write the SELECT DB opcode *        /if (Rdbsavetype (&rdb,redis_rdb_opcode_selectdb) = =-1) goto Werr;        if (Rdbsavelen (&rdb,j) = =-1) goto Werr;
Outputs the opcode of select DB, followed by the corresponding DB number. The exact format is a 1-byte opcode, plus a 4-byte db number.

        /* Iterate this DB writing every entry *        /while ((de = Dictnext (di)) = NULL) {            SDS keystr = Dictgetkey (DE);            RobJ key, *o = Dictgetval (DE);            Long long expire;            Initstaticstringobject (KEY,KEYSTR);            expire = Getexpire (Db,&key);            if (Rdbsavekeyvaluepair (&rdb,&key,o,expire,now) = =-1) goto werr;        }        Dictreleaseiterator (DI);
Then there is a while loop that iterates through all the k-v pairs and dumps. For each kv, get key,value and expire time. Then call the Rdbsavekeyvaluepair function to dump and look at the function below.

/* Save a Key-value pair, with expire time, type, key, value. * on Error-1 is returned. * On success if the key were actually saved 1 is returned, otherwise 0 * are returned (the key was already expired).  */int Rdbsavekeyvaluepair (Rio *rdb, RobJ *key, RobJ *val,                        long long expiretime, long Long Now) {/    * Save the expire Time *    /if (expiretime! =-1) {/        * If this key is already expired skip it *        /if (Expiretime < now) return 0;        if (Rdbsavetype (rdb,redis_rdb_opcode_expiretime_ms) = =-1) return-1;        if (Rdbsavemillisecondtime (rdb,expiretime) = =-1) return-1;    }    /* Save type, key, value *    /if (Rdbsaveobjecttype (rdb,val) = =-1) return-1;    if (Rdbsavestringobject (rdb,key) = =-1) return-1;    if (Rdbsaveobject (rdb,val) = =-1) return-1;    return 1;}
If the expire is not empty, the information is output. First write opcode represents expire, then the specific time-out. The next is the specific KV pair, first also similar to opcode, which represents the type of value, then the string type key, and finally the value object. The specific object of the dump content is more, here temporarily do not expand.

After all the DB dumps have been completed, take a look at the finishing touches.

    DI = NULL; /* So, we don ' t release it again on error. *    /* EOF opcode *    /if (Rdbsavetype (&rdb,redis_rdb_opcode_eof) = = 1) goto Werr;
Output EOF corresponds to the opcode, which indicates that the RDB ends.

    /* CRC64 checksum. It would be the zero if checksum computation are disabled, the     * Loading code skips the check in this case. */    Cksum = R Db.cksum;    Memrev64ifbe (&cksum);    if (Riowrite (&rdb,&cksum,8) = = 0) goto Werr;
Computes and outputs the CRC.

    /* Make sure data won't remain on the OS ' s output buffers *    /if (fflush (fp) = = EOF) goto Werr;    if (Fsync (Fileno (fp)) = =-1) goto Werr;    if (fclose (fp) = = EOF) goto Werr;
The first is to call Fflush to flush the output buffer to the page cache, and then call Fsync to write the contents of the cache and finally close the file.

    /* Use RENAME to make sure the db file was changed atomically only     * If the Generate DB file is OK. *    /if (RENAME (t mpfile,filename) = =-1) {        redislog (redis_warning, "Error Moving temp DB file on the final destination:%s", Strerror (er Rno));        Unlink (tmpfile);        return redis_err;    }
Renames the temporary file to the specified file name.

    Redislog (Redis_notice, "DB saved on disk");    Server.dirty = 0;    Server.lastsave = time (NULL);    Server.lastbgsave_status = REDIS_OK;    return REDIS_OK;
Finally, print the logs, reset the dirty and lastsave, and these two values will affect the timing of the passive triggering of the RDB dump.

Werr:    fclose (FP);    Unlink (tmpfile);    Redislog (redis_warning, "Write Error saving DB on disk:%s", Strerror (errno));    if (di) dictreleaseiterator (DI);    return redis_err;
The error handling of the above error is mainly to delete the temporary file, destroy the iterator and print the log.

Above is the entire RDB dump process, in the background to the RDB dump, the above is done in the sub-process, the main process also need to do some final cleanup work, the following look at this section. In Servercron, if Server.rdb_child_pid is not 1 (a child process that has an RDB dump), the wait3 is called to reap the child process, and if the RDB subprocess is complete, The Backgroundsavedonehandler function is called to do the final processing.

/* A background Saving child (BGSAVE) terminated it work. Handle this. */void backgroundsavedonehandler (int exitcode, int bysignal) {if (!bysignal && exitcode = 0) {Redislo        G (Redis_notice, "Background saving terminated with success");        Server.dirty = Server.dirty-server.dirty_before_bgsave;        Server.lastsave = time (NULL);    Server.lastbgsave_status = REDIS_OK;        } else if (!bysignal && ExitCode! = 0) {Redislog (redis_warning, "Background saving error");    Server.lastbgsave_status = Redis_err;        } else {mstime_t latency;        Redislog (redis_warning, "Background saving terminated by signal%d", bysignal);        Latencystartmonitor (latency);        Rdbremovetempfile (SERVER.RDB_CHILD_PID);        Latencyendmonitor (latency);        latencyaddsampleifneeded ("Rdb-unlink-temp-file", latency); /* SIGUSR1 is whitelisted, so we have a-to-kill a child without * tirggering an error ConditMnl    */if (bysignal! = SIGUSR1) Server.lastbgsave_status = Redis_err;    } server.rdb_child_pid =-1;    Server.rdb_save_time_last = Time (NULL)-server.rdb_save_time_start;    Server.rdb_save_time_start =-1; /* Possibly there is slaves waiting for a BGSAVE on order to being served * (the first stage of SYNC is a bulk transfer of Dump.rdb) */Updateslaveswaitingbgsave ((!bysignal && ExitCode = = 0)? REDIS_OK:REDIS_ERR);}
Compared to aof rewrite, this piece of work is simpler, mainly based on the exit state of the subprocess and whether it is processed by the signal kill. The last function, Updateslaveswaitingbgsave, is used to complete the RDB dump in master-slave synchronization, notifying the slave to transmit the RDB.

3. RDB Load

The RDB load is mainly used in two places:

1) When Redis starts, the RDB is loaded

2) master-Slave synchronization, the primary sends the RDB from

When Redis starts, the RDB is loaded, like aof, in the Loaddatafromdisk function:

/* Function called at the startup to load RDB or AOF the file in memory. */void Loaddatafromdisk (void) {    long long start = Ustime ();    if (server.aof_state = = redis_aof_on) {        if (loadappendonlyfile (server.aof_filename) = = REDIS_OK)            Redislog ( Redis_notice, "DB loaded from Append only file:%.3f seconds", (float) (Ustime ()-start)/1000000);    } else {        if (rdbload (server.rdb_filename) = = REDIS_OK) {            redislog (redis_notice, "DB loaded from disk:%.3f seconds" ,                (float) (Ustime ()-start)/1000000);        } else if (errno! = ENOENT) {            redislog (redis_warning, "Fatal Error loading the DB:%s. Exiting.", Strerror (errno));            Exit (1);}}    
If the AOF configuration is not turned on, an RDB is attempted to load. To complete the loading of the RDB by Rdbload, take a look at this function.

    uint32_t dbid;    int type, rdbver;    Redisdb *db = server.db+0;    Char buf[1024];    Long Long expiretime, now = Mstime ();    FILE *FP;    Rio Rdb;    if (fp = fopen (filename, "r")) = = NULL) return redis_err;
The first is to open the Rdb file.

    Rioinitwithfile (&RDB,FP);    Rdb.update_cksum = Rdbloadprogresscallback;    Rdb.max_processing_chunk = server.loading_process_events_interval_bytes;
Initialize Rio, set the update_cksum callback function, and the block size of read (2M is configured by default). Here are some features that are also done with Update_cksum:

1) Update loading progress

2) If it is the master-slave synchronization process, loading the RDB, because the entire loading process may be very long, so you need to keep the master to send heartbeat, to avoid the master think that this slave has been timeout, active disconnection.

3) handle some IO events

    if (Rioread (&rdb,buf,9) = = 0) goto Eoferr;    BUF[9] = ' + ';    if (memcmp (buf, "REDIS", 5)! = 0) {        fclose (FP);        Redislog (redis_warning, "wrong signature trying to load DB from file");        errno = EINVAL;        return redis_err;    }    Rdbver = Atoi (buf+5);    if (Rdbver < 1 | | rdbver > Redis_rdb_version) {        fclose (FP);        Redislog (redis_warning, "Can ' t handle RDB format version%d", rdbver);        errno = EINVAL;        return redis_err;    }
Reads the first 9 bytes of the RDB, verifies the magic number and the version.

    Startloading (FP);
Prepares to start loading, records the load start time, and the total number of bytes that need to be loaded to update the load progress.

Next is an explanation loop, which reads the instructions continuously.

    while (1) {        //explanation of Instructions    }
Let's take a look at the explanation of an instruction:

        RobJ *key, *val;        Expiretime =-1;        /* Read type. *        /if (type = Rdbloadtype (&rdb)) = =-1) goto Eoferr;
First read the type (corresponding to opcode).

        if (type = = Redis_rdb_opcode_expiretime) {            if (Expiretime = Rdbloadtime (&rdb)) = =-1) goto Eoferr;            /* We read the need to read the object type again. *            /if (type = Rdbloadtype (&rdb)) = =-1) goto Eoferr;            /* The Expiretime opcode specifies time in seconds, so convert             * to milliseconds. */            Expiretime *=;        } E LSE if (type = = Redis_rdb_opcode_expiretime_ms) {/            * Milliseconds precision expire times introduced with RDB             * ver Sion 3. */            if ((Expiretime = Rdbloadmillisecondtime (&rdb)) = =-1) goto Eoferr;            /* We read the need to read the object type again. *            /if (type = Rdbloadtype (&rdb)) = =-1) goto Eoferr;        }
If the opcode corresponds to a expire instruction, it is necessary to parse out the corresponding expire time and then read the type again (corresponding to the type of value in the subsequent KV pair).

        if (type = = redis_rdb_opcode_eof) break            ;
If the opcode corresponds to the EOF instruction, the RDB is loaded and the loop jumps out.

        /* Handle SELECT DB opcode as a special case *        /if (type = = Redis_rdb_opcode_selectdb) {            if (dbid = Rdbloadlen (&A mp;rdb,null)) = = Redis_rdb_lenerr)                goto Eoferr;            if (dbid >= (unsigned) server.dbnum) {                redislog (redis_warning, "Fatal:data file is created with a REDIS server conf Igured to handle more than%d databases. Exiting\n ", server.dbnum);                Exit (1);            }            db = Server.db+dbid;            Continue;        }
If opcode is the selectdb instruction, read the DB number and switch to the corresponding db.

Above the special instructions executed, the next to resolve the kv pair.

        /* Read Key */        if ((key = Rdbloadstringobject (&RDB)) = = NULL) goto Eoferr;
Reads the string type key.

        /* Read Value */        if (val = Rdbloadobject (type,&rdb)) = = NULL) goto Eoferr;
Reading value, the function Rdbloadobject performs the loading of different types of objects depending on the type, and this function is not expanded.

        /* Check If the key already expired. This function is a used when loading         * An RDB file from disk, either at startup, or if an RDB were         * received from The master. In the latter case, the master was         * responsible for key expiry. If we would expire keys here, the * snapshot taken by the master could not be reflected on the         slave. *        /if (server. Masterhost = = NULL && expiretime! =-1 && expiretime < now) {            decrrefcount (key);            Decrrefcount (val);            Continue;        }
The detection is expired and the KV pair is not added to the DB if it expires.

        /* Add The new object in the hash table *        /Dbadd (db,key,val);
Adds a KV pair to the DB.

        /* Set The expire time if needed */        if (expiretime! =-1) setexpire (db,key,expiretime);        Decrrefcount (key);
If expire time is set, it is added to expire dict.

    /* Verify The checksum if RDB version is >= 5 *    /if (rdbver >= 5 && server.rdb_checksum) {        uint64_t Cksum, expected = rdb.cksum;        if (Rioread (&rdb,&cksum,8) = = 0) goto Eoferr;        Memrev64ifbe (&cksum);        if (cksum = = 0) {            redislog (redis_warning, "RDB file is saved with checksum disabled:no check performed.");        else if (cksum! = expected) {            redislog (redis_warning, "wrong RDB checksum. Aborting now. ");            Exit (1);        }    }
Checks the check sum.

    Fclose (FP);    Stoploading ();    return REDIS_OK;
Finally, close the file and set Server.load to 0, indicating that the load is not in progress.

Eoferr:/* Unexpected end of file is handled here with a fatal exit *    /Redislog (redis_warning, "short read or OOM Loadi Ng DB. Unrecoverable error, aborting now. ");    Exit (1);    return redis_err; /* Just to avoid warning */
An error occurred during the above loading process and will jump to the Eoferr branch. When the load error occurs, print the log and exit the process.


Redis Source Analysis (7)--rdb

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.