Today, we are analyzing the Code Implementation of the string operation class in the redis source code. With several analysis experiences, I gradually felt that I had to change the analysis method. If every API performs code analysis and some functionality is repeated, the analysis efficiency is low. So next I think the analysis of the Code is focused on the overall implementation of a functional thinking, and I will also pick out a more characteristic method for splitting, this also allows us to see some magical code in it. Well, back to the question. Speaking of strings, no matter which programming language they are put into, they are very frequently used operation classes. What new string, Concat, strcopy, substr, splitstr, we must be very familiar with these methods. In fact, these methods are widely used in the advanced languages we call. For example, the more basic languages such as C language have not opened so many APIs, and there is no such class as string, instead, it is implemented in the form of char [] arrays. So the SDS string operation class we are talking about today is also based on char [] operations.
On the homepage, we will first list the SDS. h header file:
/* Sdslib, a C dynamic strings library ** copyright (c) 2006-2010, Salvatore sanfilippo <antirez at gmail dot com> * All Rights Reserved. ** redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are met: * ** redistributions of source code must retain the above copyright notice, * This list of conditions and the fo Lowing disclaimer. ** redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclawing in the * Documentation and/or other materials provided with the distribution. ** neither the name of redis nor the names of its contributors may be used * to endorse or promote products derived from this software without * specific prior written Permission. ** this software is provided by the copyright holders and contributors "as is" * and any express or implied warranties, including, but not limited, the * implied warranties of merchantability and fitness for a special purpose * are disclaimed. in no event shall the copyright owner or contributors be * liable for any direct, indirect, incidental, special, exemplary, or * conseque Ntial damages (including, but not limited to, procurement of * substitute goods or services; loss of use, data, or profits; or business * interruption) however caused and on any theory of liability, whether in * contract, strict liability, or tort (including negligence or otherwise) * arising in any way out of the use of this software, even if advised of the * possibility of such damage. */# ifndef _ Sds_h # DEFINE _ sds_h/* maximum memory allocated 1 M */# define sds_max_prealloc (1024*1024) # include <sys/types. h> # include <stdarg. h>/* declares a char type of SDS */typedef char * SDS;/* string struct type */struct sdshdr {// character length unsigned int Len; // The current available space unsigned int free; // The Buf char Buf [] ;};/* calculates the length of SDS, returns the value of the size_t type * // * size_t, which is a machine-related unsigned type and is sufficient to ensure the size of objects in the storage memory. */Static inline size_t sdslen (const SDS s) {struct sdshdr * Sh = (void *) (S-(sizeof (struct sdshdr); Return sh-> Len ;} /* obtain available space based on the free mark in sdshdr */static inline size_t sdsavail (const SDS s) {struct sdshdr * Sh = (void *) (S-(sizeof (struct sdshdr); Return sh-> free;} SDS sdsnewlen (const void * init, size_t initlen); // Based on the given length, generates an sdssds sdsnew (const char * init); // generates sdssds sdsempty (void) based on the given value; // clears the SDS operation size_t sdslen (const sds s ); // obtain the SDS sdsdup (const sds s); // SDS replication method void sdsfree (sds s ); // The free release method of SDS size_t sdsavail (const sds s); // determine the available space SDS sdsgrowzero (sds s, size_t Len) obtained by SDS ); // extend the string to the specified length SDS sdscatlen (sds s, const void * t, size_t Len); SDS sdscat (sds s, const char * t ); // SDS is connected to the char character SDS sdscatsds (sds s, const sds t); // SDS is connected to the sdssds sdscpylen (sds s, const char * t, size_t Len ); // SDS sdscpy (sds s, const char * t) related to string replication; // SDS sdscatvprintf (sds s, const char * FMT, va_list AP) related to string replication ); // string formatting output depends on the existing method sprintf, which is less efficient than the # ifdef _ gnuc _ SDS sdscatprintf (sds s, const char * FMT,...) written by the following code ,...) _ attribute _ (format (printf, 2, 3); # elsesds sdscatprintf (sds s, const char * FMT ,...); # endifsds sdscatfmt (sds s, char const * FMT ,...); // formatted string output SDS sdstrim (sds s, const char * cset); // string reduction void sdsrange (sds s, int start, int end ); // string truncation function void sdsupdatelen (sds s); // update the latest String Length void sdsclear (sds s); // string clearing operation int sdscmp (const SDS S1, const SDS S2); // SDS comparison function SDS * sdssplitlen (const char * s, int Len, const char * SEP, int seplen, int * count ); // string split sub-string void sdsfreesplitres (SDS * tokens, int count); // release the sub-String Array void sdstolower (sds s ); // convert SDS characters to lowercase to void sdstoupper (sds s); // convert SDS characters to uppercase sdsfromlonglong (long value); // generate an array string SDS sdscatrepr (SDS s, const char * P, size_t Len); SDS * sdssplitargs (const char * line, int * argc); // split the parameter SDS sdsmapchars (sds s, const char * from, const char * To, size_t setlen); // character ing, "ho", "01", H ing to 0, O ing to 1sds sdsjoin (char ** argv, int argc, char * SEP ); // use a separator to connect the string sub-array to form a new string/* low level functions exposed to the user API * // * API opened to the user */SDS sdsmakeroomfor (SDS s, size_t addlen); void sdsincrlen (sds s, int incr); SDS sdsremovefreespace (sds s); size_t sdsallocsize (sds s); # endif
It defines many common methods we hope to see. They are good and seem very comprehensive. We still admire the source code writers and use C language to implement these functions. Before enabling the SDS string implementation method, let me talk about the principle in which SDS is implemented. The answer is the sdshdr structure. Many of the operations are to convert SDS to the sdshdr structure first, set the status of the current string using some variables (equivalent to the attribute of this string), and then return the result of the operation after sdshdr-> Buf. Here, we can understand that sdshdr is a string object, and SDS is only the specific value in it. Many forms are based on the following operations:
For example, the clearing operation method;
/* Modify an SDS string on-place to make it empty (zero length ). * However all the existing buffer is not discarded but set as free space * so that next append operations will not require allocations up to the * number of bytes previusly available. * // * clear the string */void sdsclear (SDS s) {struct sdshdr * Sh = (void *) (S-(sizeof (struct sdshdr ))); // increase the idle length sh-> free + = sh-> Len; SH-> Len = 0; // The cache in the string is not actually lost, set the first entry to the end flag so that the next operation can reuse sh-> Buf [0] = '\ 0 ';}
For example, to create a string, return the new string by returning the struct sdshdr-> Buf:
/* Create a new SDS string with the content specified by the 'init 'pointer * And 'initlen '. * If null is used for 'init 'the string is initialized with zero bytes. ** the string is always null-termined (all the SDS strings are, always) So * even if you create an SDS string with: ** mystring = sdsnewlen ("ABC ", 3 "); ** you can print the string with printf () as there is an implicit \ 0 at the * end of the string. however the string is binary safe and can contain * \ 0 characters in the middle, as the length is stored in the SDS header. * // * Create a new string method, input the target length, Initialization Method */SDS sdsnewlen (const void * init, size_t initlen) {struct sdshdr * Sh; If (init) {SH = zmalloc (sizeof (struct sdshdr) + initlen + 1);} else {// when the init function is null, the zcalloc method SH = zcalloc (sizeof (struct sdshdr) + initlen + 1);} If (SH = NULL) return NULL; SH-> Len = initlen; sh-> free = 0; If (initlen & init) memcpy (Sh-> Buf, init, initlen ); // Add the '\ 0' Terminator sh-> Buf [initlen] =' \ 0' to the end '; // Finally, the Buf in the returned string struct represents the new string return (char *) Sh-> Buf ;}
The following describes several special methods. I didn't expect to see the specific implementation method here. The output method C is formatted and the language implementation is as follows:
/* This function is similar to sdscatprintf, but much faster as it does * not rely on sprintf () family functions implemented by the libc that * Are ofverten y slow. moreover directly handling the SDS string as * new data is concatenated provides a performance improvement. ** however this function only handles an incompatible subset of printf-alike * format specifiers: ** % s-C string * % s-SDS String * % I-Signed int * % I-64 bit signed integer (long, int64_t) * % u-Unsigned int * % u-64 bit unsigned integer (unsigned long, uint64_t) * %-verbatim "%" character. * // * format the output string and input the original string. Format: parameter */SDS sdscatfmt (sds s, char const * FMT ,...) {struct sdshdr * Sh = (void *) (S-(sizeof (struct sdshdr); size_t initlen = sdslen (s); const char * f = FMT; int I; va_list AP; va_start (AP, F MT); F = FMT;/* Next format specifier byte to process. */I = initlen;/* position of the next byte to write to DEST Str. * // key again, compare the input format of while (* f) {char next, * STR; unsigned int L; long num; unsigned long unum; /* Make sure there is always space for at least 1 char. */If (Sh-> free = 0) {S = sdsmakeroomfor (s, 1); SH = (void *) (S-(sizeof (struct sdshdr);} switch (* f) {Case '%' :/* If it is %, remember the type operation value */next = * (F + 1); F ++; Switch (next) {Case's ': case's ': Str = va_arg (AP, char *); // you can determine whether a common STR is of the SDS type, the method for calculating the length is different. l = (next ='s ')? Strlen (STR): sdslen (STR); If (Sh-> free <L) {S = sdsmakeroomfor (S, L); SH = (void *) (S-(sizeof (struct sdshdr);} // if it is a string, copy it directly to the following memcpy (S + I, STR, L ); sh-> Len + = L; SH-> free-= L; I + = L; break; Case 'I ': if (next = 'I') num = va_arg (AP, INT); else num = va_arg (AP, long); {char Buf [sds_llstr_size]; // if it is a number, call the Add numeric string method L = sdsll2str (BUF, num); If (Sh-> free <L) {S = sdsmakeroomfor (S, L ); SH = (void *) (S-(sizeof (struct sdshdr);} memcpy (S + I, Buf, L); SH-> Len + = L; sh-> free-= L; I + = L;} break; Case 'U ': // unsigned integer same as if (next = 'U') unum = va_arg (AP, unsigned INT); else unum = va_arg (AP, unsigned long ); {char Buf [sds_llstr_size]; L = sdsull2str (BUF, unum); If (Sh-> free <L) {S = sdsmakeroomfor (S, L ); SH = (void *) (S-(sizeof (struct sdshdr);} memcpy (S + I, Buf, L); SH-> Len + = L; sh-> free-= L; I + = L;} break; default:/* handle % and generally % <Unknown>. */s [I ++] = next; SH-> Len + = 1; SH-> free-= 1; break;} break; default: // non-operation type, directly add s [I ++] = * F; SH-> Len + = 1; SH-> free-= 1; break;} f ++ ;} va_end (AP);/* Add null-term */s [I] = '\ 0'; return s ;}
After reading this, it makes me feel very powerful. The formatting output is not that simple. It should be similar to the formatting output algorithm in Java. Another split method, in which the Goto space operations are used due to memory problems, is also gaining insights:
/* Split's 'with Separator in 'sept '. an array * of SDS strings is returned. * count will be set * by reference to the number of tokens returned. ** on out of memory, zero length string, zero length * separator, null is returned. ** note that 'sept' is able to split a string using * a multi-character separator. for example * sdssplit ("foo _-_ BAR", "_-_"); will return two * elements "foo" and "bar ". ** this version of the function is binary-safe but * requires length arguments. sdssplit () is just the * same function but for zero-terminated strings. * // * The SDS string segmentation method is similar to Java. lang. string spilt Method */SDS * sdssplitlen (const char * s, int Len, const char * SEP, int seplen, int * count) {int elements = 0, slots = 5, start = 0, J; SDS * tokens; If (seplen <1 | Len <0) return NULL; // The initial value of the split substring is only five groups of tokens = zmalloc (sizeof (SDS) * slots); // if the memory overflows, the null value if (tokens = NULL) is directly returned) return NULL; If (LEN = 0) {* COUNT = 0; return tokens;} // scan back and forth, to the last location that matches the separator string len-seplen for (j = 0; j <(LEN-(seplen-1); j ++) {/* Make sure there is room for the next element and the final one * // if the number of current string arrays is less than the number of existing arrays + 2, dynamically add If (slots <elements + 2) {SDS * newtokens; slots * = 2; newtokens = zrealloc (tokens, sizeof (SDS) * slots ); // if the memory overflows at this time, the GOTO statement free releases the memory and finally sees that the GOTO statement is useful if (newtokens = NULL) goto cleanup; tokens = newtokens ;} /* search the separator * // split into single-character comparison and multi-character comparison matching if (seplen = 1 & * (S + J) = Sep [0]) | (memcmp (S + J, SEP, seplen) = 0) {// assign a value to the substring tokens [elements] = sdsnewlen (S + start, J-Start ); if (tokens [elements] = NULL) goto cleanup; elements +++; Start = J + seplen; j = J + seplen-1; /* skip the separator */}/* add the final element. we are sure there is room in the tokens array. * /// add tokens [elements] = sdsnewlen (S + start, len-Start) to the last string; // if the memory overflows, empty it again, null if (tokens [elements] = NULL) goto cleanup; elements ++; * COUNT = elements; return tokens; cleanup: {// clear the int I space; for (I = 0; I <elements; I ++) sdsfree (tokens [I]); zfree (tokens); * COUNT = 0; return NULL ;}}
The first time I read the common usage, it is rare to see it. The implementation of the spilt method is not easy. I also considered the OOM situation and the dynamic expansion of memory, if there are many sub-strings, you can see in the C language that the above code is very cautious in amplification. I still admire these questions when I do not think about advanced languages at ordinary times. Next, let's take a look at the addition operation of the numeric string. This is the same as our earliest algorithm, where we perform bitwise remainder operations and add them to the main string in reverse order.
/* Helper for sdscatlonglong () doing the actual number-> string * conversion.'s 'must point to a string with room for at least * sds_llstr_size bytes. ** the function returns the lenght of the null-terminated string * Representation stored at's '. * // * Add a numeric string at the end of the string to form a new string */# define sds_llstr_size 21int sdsll2str (char * s, long value) {char * P, aux; unsigned long V; size_t L;/* generat E The string representation, this method produces * an reversed string. */V = (value <0 )? -Value: value; P = s; // use the most traditional bitwise operator to calculate the number at each position, note that the current order is actually reverse do {* P ++ = '0' + (V % 10); V/= 10 ;}while (v ); // do not forget to add the IF (value <0) * P ++ = '-';/* compute length and add null term. */L = p-S; * P = '\ 0';/* reverse the string. * /// Add the reverse numeric string that you just added to its own string s to p --; while (S <p) {aux = * s; * s = * P; * P = aux; s ++; p --;} return l ;}
The character ing function is quite similar to our replace method, but the redis version is single-character ing. The Code is as follows:
/* Modify the string substituting all the occurrences of the set of * characters specified in the 'from 'string to the corresponding character * in the' to 'array. ** for instance: sdsmapchars (mystring, "ho", "01", 2) * will have the effect of turning the string "hello" into "0ell1 ". ** the function returns the SDS string pointer, that is always the same * as the input pointer since no Resize is needed. * // * character ing, "ho", "01", H ing is 0, O ing is 1 */SDS sdsmapchars (sds s, const char * from, const char * To, size_t setlen) {size_t J, I, L = sdslen (s); For (j = 0; j <L; j ++) {for (I = 0; I <setlen; I ++) {If (s [J] = from [I]) {s [J] = to [I]; break; }}return s ;}
Well, in general, the implementation of string operations is also a huge amount of code, which has already exceeded thousands of lines. redis code calls some methods of the most primitive char [] array, after implementing the string function, I realized the underlying implementation methods of APIs in some advanced languages and gained a lot. The strings also reflected many functional programming ideas, there are many functions as parameters, there is no specific numerical type at all. Well, there are so many analyses today.
Redis source code analysis (4) -- SDS string