Redis Source code parsing (15)---aof-append only file parsing

Source: Internet
Author: User

continue to learn the Redis source data related file code analysis, today I see a file called AoF, the letter is append only file abbreviation, meaning only append file operation. In order to record the change of data operation, the data of this file is recovered in the case of abnormal situation. Similar to what I said earlier about the role of the Redo,undo log. As we all know, Redis as a memory database, each operation of the data change is put in memory first, wait until the memory data is full, in the flush to disk file, to achieve the purpose of persistence. So the AOF mode of operation is also adopted in this way. This introduces the concept of block blocks, which is actually a block of buffers. Some definitions of blocks are as follows:

/* AoF The following code is used in a simple buffer cache block to store, stored some changes in the data record, wait until the buffer to reach a certain size of data, in a persistent write to a file, the way Redis is used is append append form, This means that each append has to adjust the size of the stored block, but it is not possible to have an infinite size of block space, so Redis introduced the concept of block list, set the size of dead one block, more than the block size, into another block, where the size of each block is defined as 10M. */#define AOF_RW_BUF_BLOCK_SIZE (1024*1024*10)/    */MB per block *//* standard AOF file read-write block */typedef struct Aofrwblock {//Current file How much the block was used, the idle size    unsigned long used, free;    Specific storage content, size 10M    char buf[aof_rw_buf_block_size];} aofrwblock;

In other words, the size of each block by default is 10M, this big novel is not big, said small, if the data entered beyond the length, the system will dynamically request a new buffer block, on the server side is the form of a block list, organize the entire block:

/* Append data to the AOF rewrite buffer, allocating new blocks if needed. *//* appends data to the buffer, and if space is exceeded, a new buffer block is requested */void Aofrewritebufferappend (unsigned char *s, unsigned long len) {ListNode *ln = lis    Tlast (server.aof_rewrite_buf_blocks); Navigates to the last block of the buffer and appends the last block to the write operation aofrwblock *block = ln?    ln->value:null;  while (len) {/* If we already got at least a allocated block, try appending * at least some piece into it. */if (block) {//If the remaining idle of the current buffer block can support len-length content, write directly to unsigned long Thislen = (Block->free < le N)?            block->free:len; if (Thislen) {/* The current block was not already full. */memcpy (block->buf+block->used, S, Thisle                n);                block->used + = Thislen;                Block->free-= Thislen;                s + = Thislen;            Len-= Thislen; }} if (len) {/* first block to allocate, or need another block. */int numblocks;//if not enough, need to be newly created, write to block = Zmalloc (sizeof (*block));            Block->free = aof_rw_buf_block_size;            block->used = 0;            Also append the buffer block to the server buffer list listaddnodetail (Server.aof_rewrite_buf_blocks,block); /* Log Every time we cross more or blocks, respectively * as a notice or warning.            */numblocks = Listlength (server.aof_rewrite_buf_blocks); if (((numblocks+1)%) = = 0) {int level = ((numblocks+1)% 100) = = 0?                Redis_warning:redis_notice;            Redislog (Level, "Background AOF buffer size:%lu MB", Aofrewritebuffersize ()/(1024*1024)); }        }    }}
The following method is called when you want to proactively flush data from a buffer into a persisted disk:

/* Write The Append only file buffer on disk. * * Since We is required to write the AOF before replying to the client, * and the-the only-to-the-client socket can get a W Rite is entering when the * the event loop, we accumulate all the AOF writes in a memory * buffer and write it on disk usi ng this function just before entering * the event loop again. * About the "force" argument: * When the Fsync policy was set to ' everysec ' we could delay the flush if there * is still An Fsync () going on the background thread, since for instance * on Linux write (2) 'll be blocked by the background fsy NC anyway. * When this happens we remember that there are some aof buffer to be * flushed ASAP, and would try to do the the ServerC Ron () function. * * However if force are set to 1 we ll write regardless of the background * fsync. */#define AOF_WRITE_LOG_ERROR_RATE/* Seconds between errors logging.    *//* refreshes the contents of the buffer to disk */void flushappendonlyfile (int force) {ssize_t nwritten; int Sync_in_progrESS = 0;    mstime_t latency;    if (Sdslen (server.aof_buf) = = 0) return;    if (Server.aof_fsync = = aof_fsync_everysec) sync_in_progress = Biopendingjobsoftype (redis_bio_aof_fsync)! = 0; if (Server.aof_fsync = = Aof_fsync_everysec &&!force) {/* with this append fsync policy we do background f         Syncing. * If The Fsync is still in progress we can try to delay * The write for a couple of seconds. */if (sync_in_progress) {if (Server.aof_flush_postponed_start = = 0) {/* No previous WR Ite Postponinig, remember that we are * postponing the flush and return.                */Server.aof_flush_postponed_start = Server.unixtime;            Return } else if (Server.unixtime-server.aof_flush_postponed_start < 2) {/* We were already waiting for Fsyn C to finish, but for less * than and seconds this is still OK. Postpone again.     */return;       }/* Otherwise fall trough, and go write since we can ' t wait * over the seconds.            */server.aof_delayed_fsync++; Redislog (Redis_notice, "Asynchronous AOF Fsync" is taking too long (disk is busy?).    Writing the AOF buffer without waiting for Fsync to complete, this could slow down Redis. ");} /* We want to perform a single write.     This should being guaranteed atomic * At least if the filesystem we was writing is a real physical one. * While this would save us against the server being killed I don ' t think * there are much to does about the whole server s    Topping for power problems * or alike *///also listens for delay latencystartmonitor (latency) during write operations;    Nwritten = Write (Server.aof_fd,server.aof_buf,sdslen (SERVER.AOF_BUF));    Latencyendmonitor (latency); /* We want to capture different events for delayed writes: * When the delay is happens with a pending fsync, or with a SA Ving Tsun * Active, and when the above-CONditions is missing. * We also use a additional event name to save all samples which are * useful for graphing/monitoring purposes.    */if (sync_in_progress) {latencyaddsampleifneeded ("Aof-write-pending-fsync", latency); } else if (server.aof_child_pid! =-1 | | Server.rdb_child_pid! = 1) {latencyaddsampleifneeded ("aof-write-active-c    Hild ", latency);    } else {latencyaddsampleifneeded ("Aof-write-alone", latency);    } latencyaddsampleifneeded ("Aof-write", latency); /* We performed the write so and reset the postponed flush Sentinel to zero.    */server.aof_flush_postponed_start = 0;        if (Nwritten! = (Signed) Sdslen (SERVER.AOF_BUF)) {static time_t Last_write_error_log = 0;        int can_log = 0; /* Limit logging rate to 1 line per aof_write_log_error_rate seconds.            */if ((Server.unixtime-last_write_error_log) > aof_write_log_error_rate) {can_log = 1;        Last_write_error_log = Server.unixtime;}/* Lof the AOF write error and record the error code. */if (Nwritten = =-1) {if (Can_log) {Redislog (redis_warning, "Error writing to the AOF                File:%s ", Strerror (errno));            Server.aof_last_write_errno = errno;                                       }} else {if (can_log) {Redislog (redis_warning, "short write and writing to"                                       "The AOF file: (Nwritten=%lld," "Expected=%lld)", (Long Long) nwritten, (Long Long) Sdslen (server.aof            _BUF));                    } if (Ftruncate (SERVER.AOF_FD, server.aof_current_size) = =-1) {if (Can_log) {  Redislog (redis_warning, "Could not remove short write" "from the Append-only file. Redis may refuse "to load the AOF the next time itStarts.                "" Ftruncate:%s ", Strerror (errno)); }} else {/* If the Ftrunacate () succeeded we can set Nwritten to *-1 since th ere is no longer partial data into the AOF.            */Nwritten =-1;        } Server.aof_last_write_errno = ENOSPC; }/* Handle the AOF write error.  */if (Server.aof_fsync = = aof_fsync_always) {/* We can ' t recover when the Fsync policy was always since The * reply for the client are already in the output buffers, and we * has the contract with the User that on acknowledged write data * was synched on disk. */Redislog (redis_warning, "Can ' t recover from AOF write error when the AOF fsync policy was ' always '.            Exiting ... ");        Exit (1); } else {/* Recover from failed write leaving data into the buffer. However * set an error to stop accepting WRITES as long as the error * condition is not cleared.            */server.aof_last_write_status = Redis_err; /* Trim The SDS buffer if there was a partial write, and there * is no to undo it with Ftruncate (2).                */if (Nwritten > 0) {server.aof_current_size + = Nwritten;            Sdsrange (server.aof_buf,nwritten,-1); } return; /* We ll try again on the next call ... */}} else {/* Successful write (2). If AOF is in error state, restore the * OK State and log the event. */if (server.aof_last_write_status = = Redis_err) {redislog (redis_warning, "aof Write ER            Ror looks solved, Redis can write again. ");        Server.aof_last_write_status = REDIS_OK;    }} server.aof_current_size + = Nwritten; /* Re-use AOF buffer when it is small enough. The maximum comes from the * Arena size of 4k minus some overhead (it is otherwIse arbitrary).    */if ((Sdslen (SERVER.AOF_BUF) +sdsavail (SERVER.AOF_BUF)) < 4000) {sdsclear (SERVER.AOF_BUF);        } else {sdsfree (SERVER.AOF_BUF);    Server.aof_buf = Sdsempty (); }/* Don ' t fsync if No-appendfsync-on-rewrite is set to Yes and there is * children doing I/O in the background.            */if (Server.aof_no_fsync_on_rewrite && (server.aof_child_pid! =-1 | | Server.rdb_child_pid! =-1))    Return /* Perform the Fsync if needed. */if (Server.aof_fsync = = aof_fsync_always) {/* Aof_fsync is defined as Fdatasync () for Linux in order to Avoi d * Flushing metadata.        */Latencystartmonitor (latency); Aof_fsync (SERVER.AOF_FD);        /* Let's try to get this data on the disk */latencyendmonitor (latency);        latencyaddsampleifneeded ("Aof-fsync-always", latency);    Server.aof_last_fsync = Server.unixtime; } else if ((Server.aof_fsync = = aof_fsync_everysec && Server. unixtime > Server.aof_last_fsync) {if (!sync_in_progress) Aof_background_fsync (SERVER.AOF_FD);    Server.aof_last_fsync = Server.unixtime; }}

Of course, there are operations on all data in the database, do the operation record, cheap use this file for overall recovery:

/* Write a sequence of commands able to fully rebuild the dataset into * "FileName". Used both by Rewriteaof and bgrewriteaof. * In order to minimize the number of commands needed in the rewritten * log Redis uses variadic commands when possible, such as Rpush, Sadd * and Zadd. However at Max Redis_aof_rewrite_items_per_cmd ITEMS PER time * is inserted using a single command.    *//* the contents of the database by the key value, and re-completely rewrite the file into the */int rewriteappendonlyfile (char *filename) {dictiterator *di = NULL;    Dictentry *de;    Rio AoF;    FILE *FP;    Char tmpfile[256];    Int J;    Long Long now = Mstime ();  /* Note that we had to use a different temp name here compared to the * one used by Rewriteappendonlyfilebackground () function.    */snprintf (tmpfile,256, "temp-rewriteaof-%d.aof", (int) getpid ());    fp = fopen (Tmpfile, "w"); if (!FP) {Redislog (redis_warning, "Opening the temp file for AOF rewrite in Rewriteappendonlyfile ():%s", strerror        (errno));    return redis_err; } Rioinitwithfile (&AMP;AOF,FP);    if (Server.aof_rewrite_incremental_fsync) Riosetautosync (&aof,redis_aof_autosync_bytes);        for (j = 0; J < Server.dbnum; J + +) {char selectcmd[] = "*2\r\n$6\r\nselect\r\n";        Redisdb *db = server.db+j;        Dict *d = db->dict;        if (dictsize (d) = = 0) continue;        DI = Dictgetsafeiterator (d);            if (!di) {fclose (FP);        return redis_err;        }/* SELECT the new DB */if (Riowrite (&aof,selectcmd,sizeof (Selectcmd)-1) = = 0) goto Werr;        if (Riowritebulklonglong (&aof,j) = = 0) goto Werr;            /* Iterate this DB writing every entry *//iterates through each record in the database, logging while (de = Dictnext (di)) = NULL) {            SDS KEYSTR;            RobJ key, *o;            Long Long expiretime;            Keystr = Dictgetkey (DE);            o = Dictgetval (DE);            Initstaticstringobject (KEY,KEYSTR);            Expiretime = Getexpire (Db,&key); /* If This key is AlReady expired Skip It */if (expiretime! =-1 && expiretime < now) continue; /* Save the key and associated value */if (O->type = = redis_string) {/* Emit a SET command                */char cmd[]= "*3\r\n$3\r\nset\r\n";                if (Riowrite (&aof,cmd,sizeof (CMD)-1) = = 0) goto Werr;                /* Key and Value */if (riowritebulkobject (&aof,&key) = = 0) goto Werr;            if (Riowritebulkobject (&aof,o) = = 0) goto Werr;            } else if (O->type = = redis_list) {if (Rewritelistobject (&aof,&key,o) = = 0) goto Werr;            } else if (O->type = = Redis_set) {if (Rewritesetobject (&aof,&key,o) = = 0) goto Werr; } else if (O->type = = Redis_zset) {if (Rewritesortedsetobject (&aof,&key,o) = = 0) goto wer            R } else if (O->type = = Redis_hash) {if (Rewritehashobject (&Amp;aof,&key,o) = = 0) goto Werr;            } else {redispanic ("Unknown object Type"); }/* Save the expire time */if (expiretime! =-1) {char cmd[]= "*3\r\n$9\r\npexpirea                T\r\n ";                if (Riowrite (&aof,cmd,sizeof (CMD)-1) = = 0) goto Werr;                if (Riowritebulkobject (&aof,&key) = = 0) goto Werr;            if (Riowritebulklonglong (&aof,expiretime) = = 0) goto Werr;    }} dictreleaseiterator (DI);    }/* Make sure data won't remain on the OS ' s output buffers */if (fflush (fp) = = EOF) goto Werr;    if (Fsync (Fileno (fp)) = =-1) goto Werr;    if (fclose (fp) = = EOF) goto Werr; /* Use RENAME to make sure the db file is changed atomically only * If the Generate DB file is OK.  */if (rename (tmpfile,filename) = =-1) {Redislog (redis_warning, "Error Moving temp Append only file on the final        Destination:%s ", Strerror (errno)); Unlink (Tmpfile);    return redis_err;    } redislog (Redis_notice, "SYNC append only file rewrite performed");    return Redis_ok;werr:fclose (FP);    Unlink (tmpfile);    Redislog (redis_warning, "Write Error writing append only file on disk:%s", Strerror (errno));    if (di) dictreleaseiterator (DI); return redis_err;}

The system also opens the background for this method operation:

/* This is what rewriting of the append only file in background works: * * 1) The user calls Bgrewriteaof * 2) Redis calls This function, which forks (): *    2a) The child rewrite the append only file in a temp file. *    2b) The parent accumulates differences in SERVER.AOF_REWRITE_BUF. * 3) When the child finished ' 2a ' exists. * 4) The parent would trap the exit code, if it ' s OK, would append the *    data accumulated into server.aof_rewrite_buf in To the temp file, and *    finally would rename (2) The temp file in the actual file name. * The    new file is reopen Ed as the new append only file. profit! *//* background aof Data File write operation */int rewriteappendonlyfilebackground (void)

The principle is the same as yesterday's analysis, with fork (), create a sub-thread, and finally open the API:

/* AOF.C API */void aofrewritebufferreset (void)/* Releases the old buffer in the server and creates a new buffer */unsigned long Aofrewritebuffersize (void)/* Returns the total size of the current aof buffer */void aofrewritebufferappend (unsigned char *s, unsigned long len)/* in slow Data is appended to the flushing area, and if the space is exceeded, a new buffer block */ssize_t aofrewritebufferwrite (int fd)/* is written to the file with the buffer contents of the save memory, as well as a chunked block write */void aof_ Background_fsync (int fd)/* Open background thread for file synchronization operation */void stopappendonly (void)/* Stop append Data operation, here is a command mode */int startappendonly (  void)/* Open Append mode */void flushappendonlyfile (int force)/* Refresh the contents of the buffer to disk */sds Catappendonlygenericcommand (SDS DST, int argc, RobJ **argv)/* According to the input string, the parameters are packaged, again output */sds Catappendonlyexpireatcommand (SDS buf, struct Rediscommand *cmd, RobJ *key, Rob J *seconds)/* Convert expired commands to pexpireat command, convert time to absolute time */void feedappendonlyfile (struct rediscommand *cmd, int dictid, RobJ * * argv, int argc)/* Depending on the cmd operation, the different conversions of the command */struct redisclient *createfakeclient (void)/* command is always performed by the client, so the method to introduce the client is */void F REEFAKECLIENTARGV (struct redisclient *c)/* Release client parameter operation */void FREEfakeclient (struct redisclient *c)/* Release client parameter action */int loadappendonlyfile (char *filename)/* Load aof file contents */int Riowritebulko Bject (Rio *r, RobJ *obj)/* Write bulk objects, divided into Longlong objects, and ordinary string objects */int Rewritelistobject (Rio *r, RobJ *key, RobJ *o)/* Write to Li St List object, divided into Ziplist compression list and linedlist normal link table operations */int rewritesetobject (Rio *r, RobJ *key, RobJ *o)/* Write set Object data */int rewritesorteds Etobject (Rio *r, RobJ *key, RobJ *o)/* Write sorted set object */static int Riowritehashiteratorcursor (Rio *r, Hashtypeiterator *hi, int what)/* Write to the object that the hash iterator is currently pointing to */int Rewritehashobject (Rio *r, RobJ *key, RobJ *o)/* Write the hash Dictionary object */int rewriteappendonlyfile (ch AR *filename)/* The contents of the database are re-rewritten into the file */int rewriteappendonlyfilebackground (void)//background for AOF data file write operations by key values */void Bgrewriteaofcommand (redisclient *c)/* Background write aof file Operation Command mode */void aofremovetempfile (pid_t childpid)/* Remove the AoF file */void aofupdatecurrentsize (void) * * Produced by a second son thread ID childpid and update the current aof file size */void Backgroundrewritedonehandler ( int exitcode, int bysignal)/* callback method after the completion of the table thread write operation */

Redis Source code parsing (15)---aof-append only file parsing

Related Article

E-Commerce Solutions

Leverage the same tools powering the Alibaba Ecosystem

Learn more >

Apsara Conference 2019

The Rise of Data Intelligence, September 25th - 27th, Hangzhou, China

Learn more >

Alibaba Cloud Free Trial

Learn and experience the power of Alibaba Cloud with a free trial worth $300-1200 USD

Learn more >

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.