Continue to learn the code analysis of Data-related files under the redis source code. Today I am looking at a file named aof. This letter is short for appendONLYfile, meaning only append files. Here, the file append record is used to record the change records of data operations for data recovery in abnormal situations. Similar to the redo and undo logs I mentioned earlier
Continue to learn the code analysis of Data-related files under the redis source code. Today I am reading a file named aof, which is short for append ONLY file, only append files. Here, the file append record is used to record the change records of data operations for data recovery in abnormal situations. Similar to the redo and undo logs I mentioned earlier
Continue to learn the code analysis of Data-related files under the redis source code. Today I am reading a file named aof, which is short for append ONLY file, only append files. Here, the file append record is used to record the change records of data operations for data recovery in abnormal situations. Similar to the redo and undo logs I mentioned earlier. As we all know, redis, as a memory database, stores data in the memory for every change. When the memory data is full, it is refreshed to the disk file for persistence. Therefore, aof adopts this operation mode. Here we introduce the concept of a block, which is actually a buffer block. Some block definitions are as follows:
/* Some codes below AOF use a simple buffer cache block for storage, which stores some change operation records of data and waits until the buffer reaches a certain data size, redis uses the append method to persistently write data to a file. This means that the size of the storage block must be adjusted for each append operation, however, there cannot be an infinite block space. Therefore, redis introduces the block list concept here to set the size of one block, which exceeds the unit of block size and is saved to another block, the size of each block is defined as 10 MB. */# define AOF_RW_BUF_BLOCK_SIZE (1024*1024*10) /* 10 MB per block * // * standard aof file read/write block */typedef struct aofrwblock {// number of currently used file blocks, idle size unsigned long used, free; // storage content, 10 MB char buf [AOF_RW_BUF_BLOCK_SIZE];} aofrwblock;
That is to say, the size of each block is 10 MB by default. This big novel is not big. If the data entered exceeds the length, the system will dynamically apply for a new buffer block, on the server side, a block linked list is used to organize the entire block:
/* Append data to the AOF rewrite buffer, allocating new blocks if needed. * // * append data to the buffer zone. If the buffer space is exceeded, a buffer block */void aofRewriteBufferAppend (unsigned char * s, unsigned long len) will be applied) {listNode * ln = listLast (server. aof_rewrite_buf_blocks); // locate the last part of the buffer and perform the append write operation aofrwblock * block = ln? Ln-> value: NULL; while (len) {/* If we already got at least an allocated block, try appending * at least some piece into it. */if (block) {// if the remaining idle buffer block supports len Length content, write unsigned long thislen = (block-> free <len )? Block-> free: len; if (thislen) {/* The current block is not already full. */memcpy (block-> buf + block-> used, s, thislen); block-> used + = thislen; block-> free-= thislen; s + = thislen; len-= thislen;} if (len) {/* First block to allocate, or need another block. */int numblocks; // if not, create a new one and perform the write operation block = zmalloc (sizeof (* block); block-> free = AOF_RW_BUF_BLOCK_SIZE; block-> used = 0; // you need to set the buffer block ListAddNodeTail (server. aof_rewrite_buf_blocks, block);/* Log every time we cross more 10 or 100 blocks, respectively * as a notice or warning. */numblocks = listLength (server. aof_rewrite_buf_blocks); if (numblocks + 1) % 10) = 0) {int level = (numblocks + 1) % 100) = 0? REDIS_WARNING: REDIS_NOTICE; redisLog (level, "Background AOF buffer size: % lu MB", aofRewriteBufferSize ()/(1024*1024 ));}}}}
To actively refresh the data in the buffer to the persistent disk, call the following method:
/* Write the append only file buffer on disk. ** Since we are required to write the AOF before replying to the client, * and the only way the client socket can get a write is entering when the * the event loop, we accumulate all the AOF writes in a memory * buffer and write it on disk using this function just before entering * the event loop again. ** About the 'force' argument: ** When the fsyn C policy is set to 'everysec 'we may delay the flush if there * is still an fsync () going on in the background thread, since for instance * on Linux write (2) will be blocked by the background fsync anyway. * When this happens we remember that there is some aof buffer to be * flushed ASAP, and will try to do that in the serverCron () function. ** However if force is set to 1 we'll write regardless Of the background * fsync. */# define AOF_WRITE_LOG_ERROR_RATE 30/* Seconds between errors logging. * // * refresh the cache content to the disk */void flushAppendOnlyFile (int force) {ssize_t nwritten; int sync_in_progress = 0; mstime_t latency; if (sdslen (server. aof_buf) = 0) return; if (server. aof_fsync = AOF_FSYNC_EVERYSEC) sync_in_progress = bioPendingJobsOfType (REDIS_BIO_AOF_FSYNC )! = 0; if (server. aof_fsync = AOF_FSYNC_EVERYSEC &&! Force) {/* With this append fsync policy we do background fsyncing. * If the fsync is still in progress we can try to delay * the write for a couple of seconds. */if (sync_in_progress) {if (server. aof_flush_postponed_start = 0) {/* No previous write postponinig, remember that we are * postponing the flush and return. */server. aof_flush_postponed_start = server. unixtime; return;} else if (ser Ver. unixtime-server. aof_flush_postponed_start <2) {/* We were already waiting for fsync to finish, but for less * than two seconds this is still OK. postpone again. */return;}/* Otherwise fall trough, and go write since we can't wait * over two seconds. */server. aof_delayed_fsync ++; redisLog (REDIS_NOTICE, "Asynchronous AOF fsync is taking too long (disk is busy ?). Writing the AOF buffer without waiting for fsync to complete, this may be slow down Redis. ") ;}}/* We want to perform a single write. this shoshould be guaranteed atomic * at least if the filesystem we are writing is a real physical one. * While this will save us against the server being killed I don't think * there is much to do about the whole server stopping for power problems * or alike * // write operation The latencyStartMonitor (latency); nwritten = write (server. aof_fd, server. aof_buf, sdslen (server. aof_buf); latencyEndMonitor (latency);/* We want to capture different events for delayed writes: * when the delay happens with a pending fsync, or with a saving child * active, and when the above two conditions are missing. * We also use an additional event name to save all samples which is * use Ful for graphing/monitoring purposes. */if (sync_in_progress) {latencyAddSampleIfNeeded ("aof-write-pending-fsync", latency);} else if (server. aof_child_pid! =-1 | server. rdb_child_pid! =-1) {latencyAddSampleIfNeeded ("aof-write-active-child", latency);} else {latencyAddSampleIfNeeded ("aof-write-alone", latency );} latencyAddSampleIfNeeded ("aof-write", latency);/* We performed med the write so reset the postponed flush sentinel to zero. */server. aof_flush_postponed_start = 0; if (nwritten! = (Signed) sdslen (server. aof_buf) {static time_t last_write_error_log = 0; int can_log = 0;/* Limit logging rate to 1 line per AOF_WRITE_LOG_ERROR_RATE seconds. */if (server. unixtime-last_write_error_log)> AOF_WRITE_LOG_ERROR_RATE) {can_log = 1; last_write_error_log = server. unixtime;}/* Lof the AOF write error and record the error code. */if (nwritten =-1) {if (can_log) {redisLog (RE DIS_WARNING, "Error writing to the AOF file: % s", strerror (errno); server. aof_last_write_errno = errno;} else {if (can_log) {redisLog (REDIS_WARNING, "Short write while writing to" "the AOF file: (nwritten = % lld, "" expected = % lld) ", (long) nwritten, (long) sdslen (server. aof_buf);} if (ftruncate (server. aof_fd, server. aof_current_size) =-1) {if (can_log) {redisLog (REDIS_WARNING, "Co Could not remove short write "" from the append-only file. redis may refuse "" to load the AOF the next time it starts. "" ftruncate: % s ", strerror (errno) ;}} else {/* If the ftrunacate () succeeded we can set nwritten to *-1 since there is no longer partial data into the AOF. */nwritten =-1;} server. aof_last_write_errno = ENOSPC;}/* Handle the AOF write error. */if (server. aof_fsync = AOF _ FSYNC_ALWAYS) {/* We can't recover when the fsync policy is ALWAYS since the * reply for the client is already in the output buffers, and we * have the contract with the user that on acknowledged write data * is synched on disk. */redisLog (REDIS_WARNING, "Can't recover from AOF write error when the AOF fsync policy is 'alway '. exiting... "); exit (1);} else {/* Recover from failed write leaving Data into the buffer. however * set an error to stop accepting writes as long as the error * condition is not cleared. */server. aof_last_write_status = REDIS_ERR;/* Trim the sds buffer if there was a partial write, and there * was no way to undo it with ftruncate (2 ). */if (nwritten> 0) {server. aof_current_size + = nwritten; sdsrange (server. aof_buf, nwritten,-1);} return;/* We'll try again on The next call... */} else {/* Successful write (2 ). if AOF was in error state, restore the * OK state and log the event. */if (server. aof_last_write_status = REDIS_ERR) {redisLog (REDIS_WARNING, "AOF write error looks solved, Redis can write again. "); server. aof_last_write_status = REDIS_ OK;} server. aof_current_size + = nwritten;/* Re-use AOF buffer when it is small enough. the maximum co Mes from the * arena size of 4 k minus some overhead (but is otherwise arbitrary ). */if (sdslen (server. aof_buf) + sdsavail (server. aof_buf) <4000) {sdsclear (server. aof_buf);} else {sdsfree (server. aof_buf); server. aof_buf = sdsempty ();}/* Don't fsync if no-appendfsync-on-rewrite is set to yes and there are * children doing I/O in the background. */if (server. aof_no_fsync_on_rewrite & (server. Aof_child_pid! =-1 | server. rdb_child_pid! =-1) return;/* Perform the fsync if needed. */if (server. aof_fsync = AOF_FSYNC_ALWAYS) {/* aof_fsync is defined as fdatasync () for Linux in order to avoid * flushing metadata. */latencyStartMonitor (latency); aof_fsync (server. aof_fd);/* Let's try to get this data on the disk */latencyEndMonitor (latency); latencyAddSampleIfNeeded ("aof-fsync-always", latency); server. aof_last_fsync = server. un Ixtime;} else if (server. aof_fsync = AOF_FSYNC_EVERYSEC & server. unixtime> server. aof_last_fsync) {if (! Sync_in_progress) aof_background_fsync (server. aof_fd); server. aof_last_fsync = server. unixtime ;}}
Of course, some operations will record all the data in the database and use this file for full recovery at a low cost:
/* Write a sequence of commands able to fully rebuild the dataset into * "filename ". used both by REWRITEAOF and BGREWRITEAOF. ** In order to minimize the number of commands needed in the rewritten * log Redis uses variadic commands when possible, such as RPUSH, SADD * and ZADD. however at max REDIS_AOF_REWRITE_ITEMS_PER_CMD items per time * are inserted using a single command. * // * sort the database content according to the key value and try again Completely overwrite */int rewriteAppendOnlyFile (char * filename) {dictIterator * di = NULL; dictEntry * de; rio aof; FILE * fp; char tmpfile [256]; int j; long now = mstime ();/* Note that we have to use a different temp name here compared to the * one used by rewriteAppendOnlyFileBackground () function. */snprintf (tmpfile, 256, "temp-rewriteaof-% d. aof ", (int) getpid (); fp = fopen (tmpfile," w "); if (! Fp) {redisLog (REDIS_WARNING, "Opening the temp file for AOF rewrite in rewriteAppendOnlyFile (): % s", strerror (errno); return REDIS_ERR;} rioInitWithFile (& aof, fp); if (server. aof_rewrite_incremental_fsync) rioSetAutoSync (& aof, REDIS_AOF_AUTOSYNC_BYTES); for (j = 0; j <server. dbnum; j ++) {char selectcmd [] = "* 2 \ r \ n $6 \ r \ nSELECT \ r \ n"; redisDb * db = server. db + j; dict * d = db-> dict; if (dictSize (d) = = 0) continue; di = dictGetSafeIterator (d); if (! Di) {fclose (fp); return REDIS_ERR;}/* SELECT the new DB */if (rioWrite (& aof, selectcmd, sizeof (selectcmd)-1) = 0) goto werr; if (rioWriteBulkLongLong (& aof, j) = 0) goto werr;/* Iterate this DB writing every entry * // traverse each record in the database, log while (de = dictNext (di ))! = NULL) {sds keystr; robj key, * o; long expiretime; keystr = dictGetKey (de); o = dictGetVal (de); initStaticStringObject (key, keystr ); expiretime = getExpire (db, & key);/* If this key is already expired skip it */if (expiretime! =-1 & expiretime <now) continue;/* Save the key and associated value */if (o-> type = REDIS_STRING) {/* Emit a SET command */char cmd [] = "* 3 \ r \ n $3 \ r \ nSET \ r \ n"; if (rioWrite (& aof, cmd, sizeof (cmd)-1) = 0) goto werr;/* Key and value */if (rioWriteBulkObject (& aof, & key) = 0) goto werr; if (rioWriteBulkObject (& aof, o) = 0) goto werr;} else if (o-> type = REDIS_LIST) {if (rewriteListObject (& aof, & Key, o) = 0) goto werr;} else if (o-> type = REDIS_SET) {if (rewriteSetObject (& aof, & key, o) = 0) goto werr;} else if (o-> type = REDIS_ZSET) {if (rewriteSortedSetObject (& aof, & key, o) = 0) goto werr ;} else if (o-> type = REDIS_HASH) {if (rewriteHashObject (& aof, & key, o) = 0) goto werr ;} else {redisPanic ("Unknown object type");}/* Save the expire time */if (expiretime! =-1) {char cmd [] = "* 3 \ r \ n $9 \ r \ nPEXPIREAT \ r \ n"; if (rioWrite (& aof, cmd, sizeof (cmd)-1) = 0) goto werr; if (rioWriteBulkObject (& aof, & key) = 0) goto werr; if (rioWriteBulkLongLong (& aof, expiretime) = 0) goto werr;} dictReleaseIterator (di);}/* Make sure data will not remain on the OS's output buffers */if (fflush (fp) = EOF) goto werr; if (fsync (fileno (fp) =-1) goto werr; if (fclose (fp) = EOF) goto werr; /* Use RENAME to make sure the DB file is changed atomically only * if the generate DB file is OK. */if (rename (tmpfile, filename) =-1) {redisLog (REDIS_WARNING, "Error moving temp append only file on the final destination: % s ", strerror (errno); unlink (tmpfile); return REDIS_ERR;} redisLog (REDIS_NOTICE, "SYNC append only file rewrite saved med"); return REDIS_ OK; werr: fclose (fp ); unlink (tmpfile); redisLog (REDIS_WARNING, "Write error writing append only file on disk: % s", strerror (errno); if (di) dictReleaseIterator (di ); return REDIS_ERR ;}
This method is also available in the background:
/* This is how rewriting of the append only file in background works: ** 1) The user cils bgrewriteaof * 2) Redis CILS this function, that forks (): * 2a) the child rewrite the append only file in a temp file. * 2b) the parent accumulates differences in server. aof_rewrite_buf. * 3) When the child finished '2a 'exists. * 4) The parent will trap the exit code, if it's OK, will append the * data acc Umulated into server. aof_rewrite_buf into the temp file, and * finally will rename (2) the temp file in the actual file name. * The new file is reopened as the new append only file. profit! * // * Write AOF data files in the background */int rewriteAppendOnlyFileBackground (void)
The principle is that, like yesterday's analysis, fork () is used to create sub-threads and finally open the API:
/* Aof. in c, the API */void aofRewriteBufferReset (void)/* releases the old buffer in the server and creates a new buffer */unsigned long aofRewriteBufferSize (void) /* return the total size of the buffer of the current AOF */void aofRewriteBufferAppend (unsigned char * s, unsigned long len)/* append data in the buffer. If the buffer space is exceeded, A new buffer block */ssize_t aofRewriteBufferWrite (int fd)/* will be applied to write the buffer content in the stored memory to the file, which is also the write of chunks */void aof_background_fsync (int fd) /* enable the background thread to synchronize files */void stopAppendOnly (void)/* Stop the append data operation. Here, a command mode */int startAppendOnly (void) is used) /* enable append mode */void flushAppendOnlyFile (int force)/* refresh the cache content to the disk */sds catAppendOnlyGenericCommand (sds dst, int argc, robj ** argv) /* encapsulate parameters based on the input string and output the */sds catAppendOnlyExpireAtCommand (sds buf, struct redisCommand * cmd, robj * key, robj * seconds) again) /* convert all expired commands to PEXPIREAT, and convert the time to absolute time */void feedAppendOnlyFile (struct redisCommand * cmd, int dictid, robj ** argv, int argc) /* perform different command conversions according to different cmd operations */struct redisClient * createFakeClient (void)/* The command is always executed by the client, therefore, we need to introduce the client Method */void freeFakeClientArgv (struct redisClient * c)/* release client parameter operation */void freeFakeClient (struct redisClient * c) /* release client parameter operation */int loadAppendOnlyFile (char * filename)/* load AOF file content */int rioWriteBulkObject (rio * r, robj * obj)/* write to bulk object, it can be divided into LongLong objects and common String objects */int rewriteListObject (rio * r, robj * key, robj * o)/* To write List objects, ZIPLIST compression list and LINEDLIST common linked list operations */int rewriteSetObject (rio * r, robj * key, robj * o) /* write set Object Data */int rewriteSortedSetObject (rio * r, robj * key, robj * o)/* write the sorted set object */static int rioWriteHashIteratorCursor (rio * r, hashTypeIterator * hi, int what)/* the object to which the hash iterator is written */int rewriteHashObject (rio * r, robj * key, robj * o) /* write the hash dictionary object */int rewriteAppendOnlyFile (char * filename)/* rewrite the database content to the file according to the key value. */int rewriteAppendOnlyFileBackground (void) /* write AOF data files in the background */void bgrewriteaofCommand (redisClient * c)/* write AOF files in the background command mode */void aofRemoveTempFile (pid_t childpid) /* remove the aof file produced by the Child thread ID childpid */void aofUpdateCurrentSize (void)/* update the size of the current aof file */void backgroundRewriteDoneHandler (int exitcode, int bysignal) /* callback method after the write operation of the subthread is completed */