Linux c Development and linux development
Preface
From the previous chapter "Linux c development-Memcached source code analysis-Libevent-based network model", we have a basic understanding of the Memcached network model. In this chapter, we need to explain in detail the Memcached Command Parsing.
In the previous chapter, we found that Memcached is divided into the main thread and N working threads. The main thread is used to listen to the Socket connection of the accpet client, while the working thread is mainly used to take over the specific client connection.
The main thread and worker thread are monitored mainly through the Libevent-based pipe read/write events. When a connection is established, the master thread submits the connection to a worker thread to take over, later, the read and write operations on the client and server will be performed in this work thread.
The worker thread is also based on Libevent events. When a read or write event comes in, the callback function of the event is triggered.
So how does Memcached parse the command data packets uploaded by the client? Next we will explain in detail.
Memcached Command Parsing source code analysis
Starting with drive_machine
In the previous section, we can see the read/write Event Callback Function connected to the client: event_handler. In this method, drive_machine is called.
Void event_handler (const int fd, const short which, void * arg) {conn * c; // assemble conn structure c = (conn *) arg; assert (c! = NULL); c-> which = which;/* sanity */if (fd! = C-> sfd) {if (settings. verbose> 0) fprintf (stderr, "Catastrophic: event fd doesn't match conn fd! \ N "); conn_close (c); return;} // finally transferred to the drive_machine method drive_machine (c);/* wait for next event */return ;}
Conn Data Structure
Each connection has its own conn data structure. This structure stores the basic information of each connection.
Several important parameters used in this chapter:
Char * rbuf:Used to store commands in client data packets.
Int rsize:The size of the rbuf.
Char * rcurr:The character pointer of the unparsed command.
Int rbytes:Is the length of the parsed command.
typedef struct conn conn;struct conn { int sfd; sasl_conn_t *sasl_conn; bool authenticated; enum conn_states state; enum bin_substates substate; rel_time_t last_cmd_time; struct event event; short ev_flags; short which; /** which events were just triggered */ char *rbuf; /** buffer to read commands into */ char *rcurr; /** but if we parsed some already, this is where we stopped */ int rsize; /** total allocated size of rbuf */ int rbytes; /** how much data, starting from rcur, do we have unparsed */ char *wbuf; char *wcurr; int wsize; int wbytes; /** which state to go into after finishing current write */ enum conn_states write_and_go; void *write_and_free; /** free this memory after finishing writing */ char *ritem; /** when we read in an item's value, it goes here */ int rlbytes; /* data for the nread state */ /** * item is used to hold an item structure created after reading the command * line of set/add/replace commands, but before we finished reading the actual * data. The data is read into ITEM_data(item) to avoid extra copying. */ void *item; /* for commands set/add/replace */ /* data for the swallow state */ int sbytes; /* how many bytes to swallow */ /* data for the mwrite state */ struct iovec *iov; int iovsize; /* number of elements allocated in iov[] */ int iovused; /* number of elements used in iov[] */ struct msghdr *msglist; int msgsize; /* number of elements allocated in msglist[] */ int msgused; /* number of elements used in msglist[] */ int msgcurr; /* element in msglist[] being transmitted now */ int msgbytes; /* number of bytes in current msg */ item **ilist; /* list of items to write out */ int isize; item **icurr; int ileft; char **suffixlist; int suffixsize; char **suffixcurr; int suffixleft; enum protocol protocol; /* which protocol this connection speaks */ enum network_transport transport; /* what transport is used by this connection */ /* data for UDP clients */ int request_id; /* Incoming UDP request ID, if this is a UDP "connection" */ struct sockaddr_in6 request_addr; /* udp: Who sent the most recent request */ socklen_t request_addr_size; unsigned char *hdrbuf; /* udp packet headers */ int hdrsize; /* number of headers' worth of space is allocated */ bool noreply; /* True if the reply should not be sent. */ /* current stats command */ struct { char *buffer; size_t size; size_t offset; } stats; /* Binary protocol stuff */ /* This is where the binary header goes */ protocol_binary_request_header binary_header; uint64_t cas; /* the cas to return */ short cmd; /* current command being processed */ int opaque; int keylen; conn *next; /* Used for generating a list of conn structures */ LIBEVENT_THREAD *thread; /* Pointer to the thread object serving this connection */};
Drive_machine:
In the drive_machine methodC-> stateTo determine the logic to be processed.
Conn_listening: Listener status
Conn_waiting: Waiting status
Conn_read: Read status
Conn_parse_cmd: Command Line Parsing
Static void drive_machine (conn * c) {bool stop = false; int sfd; socklen_t addrlen; struct sockaddr_storage addr; int nreqs = settings. reqs_per_event; int res; const char * str; # ifdef HAVE_ACCEPT4static int use_accept4 = 1; # elsestatic int use_accept4 = 0; # endifassert (c! = NULL); while (! Stop) {switch (c-> state) {case conn_listening: // ...... more code}
Let's continue to take a look at the conn_read, conn_wait, and conn_parse_cmd status code.
1. When the client receives a data packet, conn_read is triggered.
2. conn_read reads socket data. If no data is read, The conn_waiting case will be called and the client data packets will be reported.
3. If an unexpected error occurs in conn_read, The conn_close case will be called and the client connection will be closed.
4. If conn_read reads data or the buf itself contains data, it will parse the command and call the case of conn_parse_cmd.
// Here we will continue to wait for the arrival of the client's data packet. case conn_waiting: if (! Update_event (c, EV_READ | EV_PERSIST) {if (settings. verbose> 0) fprintf (stderr, "Couldn't update event \ n"); conn_set_state (c, conn_closing); break;} // while waiting, set the connection status to read status, and stop to true. Exit the loop conn_set_state (c, conn_read); stop = true; break; // Data Reading event. When the client uploads data packets, the libevent read event case conn_read is triggered: // try_read_network mainly reads TCP Data // returns the enumeration type structure of try_read_result. This enumeration type is used to determine whether the data has been read and whether the read has failed. res = IS_UDP (C-> transport )? Try_read_udp (c): try_read_network (c); switch (res) {// if no data is read, set the event to wait. // While (stop) will continue the loop and call the conn_waiting casecase READ_NO_DATA_RECEIVED: conn_set_state (c, conn_waiting); break; // if data is read, at this time, you need to call the conn_parse_cmd logic // conn_parse_cmd: it is mainly used to parse the READ command case read_data_partitioned ed: conn_set_state (c, conn_parse_cmd); break; // read failure status, call conn_closing to close the Client Connection case READ_ERROR: conn_set_state (c, conn_closing); break; case READ_MEMORY_ERROR:/* Failed to allocate more memory * // * State already Set by try_read_network */break;} break; // This is the Client Command Used to parse Memcached. For example, it is critical to parse: set username zhulicase conn_parse_cmd: // try_read_command, used to read the command // if this method returns 0, it indicates that the parsing command failed (the command may be incomplete due to TCP packet splitting, and you need to wait for the data to arrive) if (try_read_command (c) = 0) {/* wee need more data! * /// The comment here seems to have been written incorrectly. It should be we need more data! Conn_set_state (c, conn_waiting);} break;
Try_read_network
This method is mainly used to read TCP network data. The read data is stored in the buf of c-> rbuf.
If the buf has no space to store more data, the memory block will be re-allocated. Memcached has been reallocated for a maximum of four times. It is estimated that the attack on the client has caused the buf to store the command line data packets to continue with ralloc.
// This method reads the command data passed by the client through TCP. static enum try_read_result try_read_network (conn * c) {// This method will eventually return the enumeration type of try_read_result. // The default value is READ_NO_DATA_RECEIVED. The enum try_read_result gotdata = READ_NO_DATA_RECEIVED; int res; int num_allocs = 0; assert (c! = NULL); // c-> rcurr stores the unresolved command content pointer c-> rbytes has much unparsed data // c-> rbuf is used to read the buf of the command, the pointer to the command string c-> rsize rbuf sizeif (c-> rcurr! = C-> rbuf) {if (c-> rbytes! = 0)/* otherwise there's nothing to copy */memmove (c-> rbuf, c-> rcurr, c-> rbytes ); c-> rcurr = c-> rbuf;} // read data from fd cyclically while (1) {// If the buf is full, you need to re-allocate a larger memory. // when the size of the unparsed data is greater than or equal to the size of the buf block, you need to re-allocate if (c-> rbytes> = c-> rsize) {// allocate up to four times if (num_allocs = 4) {return gotdata ;}++ num_allocs; // allocate a new memory block, the memory size is twice the rsize char * new_rbuf = realloc (c-> rbuf, c-> rsize * 2); if (! New_rbuf) {STATS_LOCK (); stats. malloc_fails ++; STATS_UNLOCK (); if (settings. verbose> 0) {fprintf (stderr, "Couldn't realloc input buffer \ n");} c-> rbytes = 0; /* ignore what we read */out_of_memory (c, "SERVER_ERROR out of memory reading request"); c-> write_and_go = conn_closing; return READ_MEMORY_ERROR ;} // c-> rcurr and c-> rbuf point to the new buf Block c-> rcurr = c-> rbuf = new_rbuf; c-> rsize * = 2; // rsize is multiplied by 2} // avail can calculate the remaining space in the buf block. int avail = c-> rsize-c-> rbytes; // here we can see the Socket reading method // c-> sfd is the Socket ID/c-> rbuf + c-> rbytes means the memory from the buf Block the address starts to store the newly read data. // avail receives the maximum data read each time. res = read (c-> sfd, c-> rbuf + c-> rbytes, avail); // If the received result res is greater than 0, the data is read in the Socket. // It is set to READ_DATA_RECEIVED Enumeration type, indicates that the data is read if (res> 0) {pthread_mutex_lock (& c-> thread-> stats. mutex); // thread lock c-> thread-> stats. bytes_read + = res; pthread_mutex_unlock (& c-> thread-> stats. mutex); gotdata = READ_DATA_RECEIVED; c-> rbytes + = res; // unprocessed data volume + currently read command sizeif (res = avail) {continue ;} else {break ;}}// determine the two cases of read failure if (res = 0) {return READ_ERROR;} if (res =-1) {if (errno = EAGAIN | errno = EWOULDBLOCK) {break;} return READ_ERROR ;}} return gotdata ;}
Try_read_command
This method is mainly used to read commands in rbuf.
Command: set username zhuli \ r \ n get username \ n
The\ NThis line break separates the commands in the data packets. Because data packets have the characteristics of sticking packets and unpacking, it can be parsed only when the command line is complete. Only when the \ n symbol is matched can a complete command be matched.
// If we already have a command line that can be processed in c-> rbuf, we can call this function to handle the Command Parsing static int try_read_command (conn * c) {assert (c! = NULL); assert (c-> rcurr <= (c-> rbuf + c-> rsize); // assert (c-> rbytes> 0 ); if (c-> protocol = negotiating_prot | c-> transport = udp_transport) {if (unsigned char) c-> rbuf [0] = (unsigned char) PROTOCOL_BINARY_REQ) {c-> protocol = binary_prot;} else {c-> protocol = ascii_prot;} if (settings. verbose> 1) {fprintf (stderr, "% d: Client using the % s protocol \ n", c-> sfd, prot_text (c-> protocol ));}} // There are two models If (c-> protocol = binary_prot) {// more code} else {// This section mainly processes non-binary mode commands to parse char * el, * cont; // If c-> rbytes = 0, the buf container does not have any command packets that can be processed, returns 0 // 0 to allow the program to continue waiting for receiving the new client message if (c-> rbytes = 0) return 0; // check whether \ n exists in the command, memcache commands use \ n to split // when the client's data packets come, memcached checks whether the received command packet is complete by finding whether \ n line breaks exist in the received data. // For example, the command: set username 10234344 \ n get username \ n // This command can be divided into two commands, the character pointer address of \ n returned by the set and get commands // el respectively el = memchr (c-> rcurr, '\ N', C-> rbytes); // if \ n is not found, it indicates that the command is incomplete, 0 is returned, and if (! El) {// c-> rbytes is the length of the received data packet // This side is very interesting. If the data packet received at a time is greater than 1 kb, then Memcached goes back and determines if the request is too large. Is there a problem? // The link to this client will be closed if (c-> rbytes> 1024) {/** We didn't have a' \ n' in the first k. this _ has _ to be a * large multiget, if not we shoshould just nuke the connection. */char * ptr = c-> rcurr; while (* ptr = '') {/* ignore leading whitespaces */++ ptr ;} if (ptr-c-> rcurr> 100 | (strncmp (ptr, "get", 4) & strncmp (ptr, "gets", 5 ))) {conn_set_state (c, conn_closing); return 1 ;}} return 0 ;}// if \ n is found, c-> rcurr The complete command cont = el + 1; // pointer node starting from the next command // determine whether it is \ r \ n. If it is \ r \ n, then el moves forward one if (el-c-> rcurr)> 1 & * (el-1) = '\ R') {el --;} // then separate the last character of the command with \ 0 (string ending symbol) * el = '\ 0 '; assert (cont <= (c-> rcurr + c-> rbytes); c-> last_cmd_time = current_time; // The Last Command time // The processing command, c-> rcurr is the command process_command (c, c-> rcurr); c-> rbytes-= (cont-c-> rcurr); // Why is this not the case? C-> rbytes = c-> rcurr-contc-> rcurr = cont; // point c-> rcurr to the pointer node assert (c-> rcurr <= (c-> rbuf + c-> rsize) of the next command);} return 1 ;}
Process_command
This method is mainly used to process specific commands. After the command is decomposed, it is distributed to different specific operations.
// Command processing function // In the previous method, we found the \ n character in rbuf and replaced it with \ 0 static void process_command (conn * c, char * command) {// tokens structure, c-> rcurr (command) the command is split out // and the command is separated into multiple elements by space. // For example, set username zhuli, the command is split into three elements, set, username, and zhuli // The maximum value of MAX_TOKENS is 8. It indicates that the memcached command line can be split into eight elements token_t tokens [MAX_TOKENS]; size_t ntokens; int comm; assert (c! = NULL); MEMCACHED_PROCESS_COMMAND_START (c-> sfd, c-> rcurr, c-> rbytes); if (settings. verbose> 1) fprintf (stderr, "<% d % s \ n", c-> sfd, command);/** for commands set/add/replace, we build an item and read the data * directly into it, then continue in nread_complete (). */c-> msgcurr = 0; c-> msgused = 0; c-> iovused = 0; if (add_msghdr (c )! = 0) {out_of_memory (c, "SERVER_ERROR out of memory preparing response"); return ;}// tokenize_command is very important, it is mainly to split the command // and put the split command element into the tokens array // parameter: command is the command ntokens = tokenize_command (command, tokens, MAX_TOKENS ); // tokens [COMMAND_TOKEN] COMMAND_TOKEN = 0 // The first parameter of the decomposed command is the operation method if (ntokens> = 3 & (strcmp (tokens [COMMAND_TOKEN]. value, "get") = 0) | (strcmp (tokens [COMMAND_TOKEN]. value, "bget") = 0) {// process the get command process_get_command (c, tokens, ntokens, false );} else if (ntokens = 6 | ntokens = 7) & (strcmp (tokens [COMMAND_TOKEN]. value, "add") = 0 & (comm = NREAD_ADD) | (strcmp (tokens [COMMAND_TOKEN]. value, "set") = 0 & (comm = NREAD_SET) | (strcmp (tokens [COMMAND_TOKEN]. value, "replace") = 0 & (comm = NREAD_REPLACE) | (strcmp (tokens [COMMAND_TOKEN]. value, "prepend") = 0 & (comm = NREAD_PREPEND) | (strcmp (tokens [COMMAND_TOKEN]. value, "append") = 0 & (comm = NREAD_APPEND) {// process the update command process_update_command (c, tokens, ntokens, comm, false ); // more code ....}
Tokenize_command:
This method is mainly used to break down commands. Specifically, a command statement is divided into multiple elements.
Example: set username zhuli \ n
It is divided into three elements: set, username, and zhuli.
// Split command method static size_t tokenize_command (char * command, token_t * tokens, const size_t max_tokens) {char * s, * e; size_t ntokens = 0; // command parameter cursor size_t len = strlen (command); // command length: unsigned int I = 0; assert (command! = NULL & tokens! = NULL & max_tokens> 1); s = e = command; for (I = 0; I <len; I ++) {// the pointer keeps moving forward, if there is a space, it will stop, split the command element, and put it into the tokens array if (* e = '') {if (s! = E) {tokens [ntokens]. value = s; tokens [ntokens]. length = e-s; ntokens ++; // Replace the space with \ 0 // the code on the Memcached side is very well written. When the command on this side is used for cutting, the memory block is not copied, but the original memory block is cut * e = '\ 0'; // a maximum of 8 elements if (ntokens = max_tokens-1) {e ++; s = e;/* so we don't add an extra token */break;} s = e + 1;} e ++ ;} if (s! = E) {tokens [ntokens]. value = s; tokens [ntokens]. length = e-s; ntokens ++;}/** If we scanned the whole string, the terminal value pointer is null, * otherwise it is the first unprocessed character. */tokens [ntokens]. value = * e = '\ 0 '? NULL: e; tokens [ntokens]. length = 0; ntokens ++; // the return value is the number of parameters. For example, if three elements are decomposed, 3 return ntokens;} is returned ;}
Process_get_command
Get command example:
// Command for processing GET requests static inline void process_get_command (conn * c, token_t * tokens, size_t ntokens, bool return_cas) {// process GET command char * key; size_t nkey; int I = 0; item * it; // & tokens [0] indicates the operation method // & tokens [1] Stores value and lengthtoken_t * key_token = & tokens [KEY_TOKEN] for key // token_t; char * suffix; assert (c! = NULL); do {// if the key length is not 0 while (key_token-> length! = 0) {key = key_token-> value; nkey = key_token-> length; // determine whether the length of the key exceeds the maximum length, the maximum length of the memcache key is 250 // note that in normal use, we should pay attention to the if (nkey> KEY_MAX_LENGTH) of the key's byte length) {// out_string (c, "CLIENT_ERROR bad command line format"); while (I --> 0) {item_remove (* (c-> ilist + I);} return;} // get data from Memcached's memory fast it = item_get (key, nkey ); if (settings. detail_enabled) {// method for the number of key records, stats_prefix _ Record_get (key, nkey, NULL! = It) ;}// if the obtained data if (it) {// c-> ilist stores the buf used to write data to the outside. // if ilist is too small, re-allocate a piece of memory if (I> = c-> isize) {item ** new_list = realloc (c-> ilist, sizeof (item *) * c-> isize * 2); if (new_list) {c-> isize * = 2; c-> ilist = new_list;} else {STATS_LOCK (); stats. malloc_fails ++; STATS_UNLOCK (); item_remove (it); break ;}/ ** Construct the response. each hit adds three elements to the * outgoing data list: * "VALUE" * key * "" + f Lags + "" + data length + "\ r \ n" + data (with \ r \ n) * // initialize the returned data structure if (return_cas) {// more code ....} /** If the command string hasn't been fully processed, get the next set * of tokens. * /// if not all commands in the command line are processed, you can get multiple elements if (key_token-> value! = NULL) {ntokens = tokenize_command (key_token-> value, tokens, MAX_TOKENS); key_token = tokens;} while (key_token-> value! = NULL); c-> icurr = c-> ilist; c-> ileft = I; if (return_cas) {c-> suffixcurr = c-> suffixlist; c-> suffixleft = I;} if (settings. verbose> 1) fprintf (stderr, "> % d END \ n", c-> sfd);/* If the loop was terminated because of out-of-memory, it is not reliable to add END \ r \ n to the buffer, because it might not end in \ r \ n. so we send SERVER_ERROR instead. */if (key_token-> value! = NULL | add_iov (c, "END \ r \ n", 5 )! = 0 | (IS_UDP (c-> transport) & build_udp_headers (c )! = 0) {out_of_memory (c, "SERVER_ERROR out of memory writing get response");} else {conn_set_state (c, conn_mwrite); c-> msgcurr = 0 ;}}