Implementation Details of TCP/IP stack in libnids (I) -- TCP session restructuring

Last Update:2018-12-05 Source: Internet

Author: User

Tags call back

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Libnids is a library for network security and can be used to detect attacks on the network. The most valuable part is that it simulates the Layer 3 and Layer 4 protocol stacks in the Linux kernel. We can further study the TCP/IP protocol stack in the Linux kernel for some valuable reference. Here is a brief introduction to the implementation details of the layer 3 and Layer 4 protocols in this library (before continuing to read, it is necessary to review the theory of the TCP/IP protocol, mainly the Sliding Window Protocol ). Here we will send you a TCP status conversion chart that is everywhere on the Internet. It is a appetizer:

In the TCP/IP protocol stack, Layer 3 corresponds to the IP layer and Layer 4 corresponds to the TCP layer. Here, the conversion from Layer 3 to Layer 4 mainly performs two important tasks: IP segment reorganization and TCP session reorganization. This article first analyzes the part of TCP session restructuring (from top to bottom, haha ).

OK. Let's take a look at the important data structure in tcp. h:

Struct skbuff {// next and prev which remain unchanged for years, this shows us that this is a bidirectional queue. // Maintain two skbuf queues for each TCP session (ip: Port <-> ip: port) // each skbuf corresponds to an IP packet on the network. The TCP stream is an IP packet. Struct skbuff * next; struct skbuff * prev; void * data; u_int len; u_int truesize; u_int urg_ptr; char fin; char urg; u_int seq; u_int ack ;};

This struct is the simulated sk_buff struct in the kernel, which is much smaller than the one in the kernel (you know, because only session restructuring is performed here ).

In nids. h

struct tuple4{    u_short source;    u_short dest;    u_int saddr;    u_int daddr;};

This is used to indicate a TCP connection.

Struct half_stream {char state; char collect; char collect_urg; char * data; // int offset; int count; // The number of bytes in the data stored here int count_new; // The number of bytes in the data that have not been called back here int bufsize; int rmem_alloc; int urg_count; u_int acked; u_int seq; u_int ack_seq; u_int sequence; u_char urgdata; u_char sequence; u_char urg_seen; u_int urg_ptr; window; u_char ts_on; // whether the tcp timestamp option enables u_char wscale_on; // whether the window extension option enables u_int curr_ts; u_int wscale; // The following is the IP packet buffer struct skbuff * list; struct skbuff * listtail ;}

This is used to represent "half a TCP session", which is actually a TCP stream in one direction.

And

struct tcp_stream{  struct tuple4 addr;  char nids_state;  struct lurker_node *listeners;  struct half_stream client;  struct half_stream server;  struct tcp_stream *next_node;  struct tcp_stream *prev_node;  int hash_index;  struct tcp_stream *next_time;  struct tcp_stream *prev_time;  int read;  struct tcp_stream *next_free;  void *user;};

Obviously, this is used to represent a complete TCP session, and finally static struct tcp_stream ** tcp_stream_table; an array of TCP session pointers is actually a hash table.

Next let's take a look at the processing process, first initialization:

Int tcp_init (int size) {... // initialize the global tcp session hash table tcp_stream_table_size = size; tcp_stream_table = calloc (tcp_stream_table_size, sizeof (char *); if (! Tcp_stream_table) {nids_params.no_mem ("tcp_init"); return-1 ;}// sets the maximum number of sessions for hash efficiency, the maximum number of elements in a hash table is set to 3/4. The table size is max_stream = 3 * tcp_stream_table_size/4. // you must first apply the max_stream tcp session structures, (to avoid wasting time on subsequent applications ). Streams_pool = (struct tcp_stream *) malloc (max_stream + 1) * sizeof (struct tcp_stream); if (! Streams_pool) {nids_params.no_mem ("tcp_init"); return-1 ;}// OK, initialize the array to a linked list for (I = 0; I <max_stream; I ++) streams_pool [I]. next_free = & (streams_pool [I + 1]); streams_pool [max_stream]. next_free = 0; free_streams = streams_pool ;... return 0 ;}

There are two things to do: 1. initialize the tcp session hash table. 2. initialize the session pool. This initialization function is only executed once during database initialization.

After the initialization is complete, it enters pcap_loop. The callback function in nids is nids_pcap_handler. In this function, some ip segments are reorganized (wait for the next article). (tcp packet) then we came to the process_tcp function, where the tcp session reorganization began. Let's take a look.

Void process_tcp (u_char * data, int skblen) {// processing header, get the ip packet and the tcp packet struct ip * this_iphdr = (struct ip *) data; struct tcphdr * this_tcphdr = (struct tcphdr *) (data + 4 * this_iphdr-> ip_hl );... // ignore the security check code here // look for it in the hash table. if no such tcp session exists, check if you want to create an if (! (A_tcp = find_stream (this_tcphdr, this_iphdr, & from_client) {// check whether this packet is the first packet in the tcp session cycle (syn Packet sent by the client) // if yes, it indicates that the client initiates a connection, then a new session if (this_tcphdr-> th_flags & TH_SYN )&&! (This_tcphdr-> th_flags & TH_ACK )&&! (This_tcphdr-> th_flags & TH_RST) add_new_tcp (this_tcphdr, this_iphdr); // otherwise, the sender (snd) is ignored if a session is found) set if (from_client) {snd = & a_tcp-> client; rcv = & a_tcp-> server;} else {rcv = & a_tcp-> client; snd = & a_tcp-> server;} // a SYN Packet if (this_tcphdr-> th_flags & TH_SYN) {// The syn packet is used to create a new connection. Therefore, either from the client and there is no flag (previously handled), or from the server with the ACK flag //. Therefore, you can only come from the server and check whether the server status is normal, if it is abnormal, ignore this package if (from_c Lient | a_tcp-> client. state! = TCP_SYN_SENT | a_tcp-> server. state! = TCP_CLOSE |! (This_tcphdr-> th_flags & TH_ACK) return; // ignore the packet if (a_tcp-> client. seq! = Ntohl (this_tcphdr-> th_ack) return; // This indicates that this package is the second handshake package of the server, and the connection (initial status, serial number, window size, etc.) a_tcp-> server is initialized. state = TCP_SYN_RECV; a_tcp-> server. seq = ntohl (this_tcphdr-> th_seq) + 1; a_tcp-> server. first_data_seq = a_tcp-> server. seq; a_tcp-> server. ack_seq = ntohl (this_tcphdr-> th_ack); a_tcp-> server. window = ntohs (this_tcphdr-> th_win); // some additional options for processing tcp below // The first timestamp option if (a_tcp-> client. ts_on) {a_tcp-> server. ts_on = ge T_ts (this_tcphdr, & a_tcp-> server. curr_ts); if (! A_tcp-> server. ts_on) a_tcp-> client. ts_on = 0;} else a_tcp-> server. ts_on = 0; // The Window expansion option if (a_tcp-> client. wscale_on) {a_tcp-> server. wscale_on = get_wscale (this_tcphdr, & a_tcp-> server. wscale); if (! A_tcp-> server. wscale_on) {a_tcp-> client. wscale_on = 0; a_tcp-> client. wscale = 1; a_tcp-> server. wscale = 1 ;}} else {a_tcp-> server. wscale_on = 0; a_tcp-> server. wscale = 1;} // return if (! (! Datalen & ntohl (this_tcphdr-> th_seq) = rcv-> ack_seq) /* the package is not correct with the serial number and no data */& // and the package is no longer in the current window (! Before (ntohl (this_tcphdr-> th_seq), rcv-> ack_seq + rcv-> window * rcv-> wscale) | // The serial number is greater than or equal to before (ntohl (this_tcphdr-> th_seq) + datalen, rcv-> ack_seq) on the right of the window. // The End of the packet is smaller than the left of the window )) // this package is abnormal, and the return is abandoned. // if it is an rst package, OK, close the connection // push the existing data to the registered callback party, and then destroy the session. If (this_tcphdr-> th_flags & TH_RST) {if (a_tcp-> nids_state = NIDS_DATA) {struct lurker_node * I; a_tcp-> nids_state = NIDS_RESET; // call back all the hooks for (I = a_tcp-> listeners; I = I-> next) (I-> item) (a_tcp, & I-> data) ;}nids_free_tcp_stream (a_tcp); return ;}/ * PAWS (prevent duplicate packets) check the timestamp */if (rcv-> ts_on & get_ts (this_tcphdr, & tmp_ts) & before (tmp_ts, snd-> curr_ts) return; // OK, if (this_tcp Hdr-> th_flags & TH_ACK) {// if it is from the client, and both sides are in the status of the second handshake, if (from_client & a_tcp-> client. state = TCP_SYN_SENT & a_tcp-> server. state = TCP_SYN_RECV) {// In this case, the serial number is correct. Okay, this package is the third handshake package, and the connection is established successfully if (ntohl (this_tcphdr-> th_ack) = a_tcp-> server. seq) {a_tcp-> client. state = TCP_ESTABLISHED; // update the client status a_tcp-> client. ack_seq = ntohl (this_tcphdr-> th_ack); // update ack No. {struct proc_node * I; struct lurker_node * j; vo Id * data; a_tcp-> server. state = TCP_ESTABLISHED; // update the server status a_tcp-> nids_state = NIDS_JUST_EST; // This is a security aspect. Ignore this. // The following loop calls back all hook functions, inform the connection to establish for (I = tcp_procs; I = I-> next) {char whatto = 0; char cc = a_tcp-> client. collect; char SC = a_tcp-> server. collect; char ccu = a_tcp-> client. collect_urg; char scu = a_tcp-> server. collect_urg; (I-> item) (a_tcp, & data); // callback if (cc <a_tcp-> client. collect) whatto | = COLLECT_cc; if (ccu <a_tcp-> client. collect_urg) whatto | = COLLECT_ccu; if (SC <a_tcp-> server. collect) whatto | = COLLECT_ SC; if (scu <a_tcp-> server. collect_urg) whatto | = COLLECT_scu; if (nids_params.one_loop_less) {if (a_tcp-> client. collect> = 2) {a_tcp-> client. collect = cc; whatto & = ~ COLLECT_cc;} if (a_tcp-> server. collect> = 2) {a_tcp-> server. collect = SC; whatto & = ~ COLLECT_ SC ;}}if (whatto) {j = mknew (struct lurker_node); j-> item = I-> item; j-> data = data; j-> whatto = whatto; j-> next = a_tcp-> listeners; a_tcp-> listeners = j ;}} if (! A_tcp-> listeners) {nids_free_tcp_stream (a_tcp); return;} a_tcp-> nids_state = NIDS_DATA ;}// return ;}// since then, handle_tcphdr-> th_flags & TH_ACK) {// call handle_ack to update ack No. handle_ack (snd, ntohl (this_tcphdr-> th_ack); // update the status. The callback notifies the connection to be closed, and then releases the connection if (rcv-> state = FIN_SENT) rcv-> state = FIN_CONFIRMED; if (rcv-> state = FIN_CONFIRMED & snd-> state = FIN_CONFIRMED) {struct lurker_n Ode * I; a_tcp-> nids_state = NIDS_CLOSE; for (I = a_tcp-> listeners; I = I-> next) (I-> item) (a_tcp, & I-> data); nids_free_tcp_stream (a_tcp); return ;}// the following processes the data packet, and the initial fin package if (datalen + (this_tcphdr-> th_flags & TH_FIN)> 0) // update the data to the receiver buffer tcp_queue (a_tcp, this_tcphdr, snd, rcv, (char *) (this_tcphdr) + 4 * this_tcphdr-> th_off, datalen, skblen); // update the window size snd-> window = ntohs (this_tcphdr-> th_win );/ /If the cache overflows (indicating a problem), release the connection if (rcv-> rmem_alloc> 65535) prune_queue (rcv, this_tcphdr); if (! A_tcp-> listeners) nids_free_tcp_stream (a_tcp );}

Well, the basic process of tcp packets is like this, mainly for establishing, releasing, and status migration of connections, next let's take a look at how the connection buffer is maintained (mainly how to update it ). Let's see the tcp_queue function:

Static voidtcp_queue (struct tcp_stream * a_tcp, struct tcphdr * this_tcphdr, struct half_stream * snd, struct half_stream * rcv, char * data, int datalen, int skblen) {u_int this_seq = ntohl (this_tcphdr-> th_seq); struct skbuff * pakiet, * tmp;/** Did we get anything new to ack? * // EXP_SEQ is the currently aggregated data serial number. We hope to receive the data starting from here // first determine whether the data is starting before EXP_SEQ if (! After (this_seq, EXP_SEQ) {// judge whether the data length is after EXP_SEQ. If yes, it indicates that there is new data; otherwise, it is a resend package, ignore if (after (this_seq + datalen + (this_tcphdr-> th_flags & TH_FIN), EXP_SEQ) {/* the packet straddles our window end */get_ts (this_tcphdr, & snd-> curr_ts); // OK: update the data area of the dataset. It is worth mentioning that the add_from_skb function calls the notify function immediately after a data segment is found, push data to the callback party add_from_skb (a_tcp, rcv, snd, (u_char *) data, datalen, this_seq, (this_tcphdr-> th_flags & TH_FIN), (this_tcphdr-> th_flags & TH_URG), ntohs (this_tcphdr-> th_urp) + this_seq-1 ); /** Do we have any old packets to ack that the above * made visible? (Go forward from skb) * // at this time, EXP_SEQ has changed. Check whether the packets in the buffer zone meet the conditions and can be processed in the same way. // If yes, it will be processed, then release pakiet = rcv-> list; while (pakiet) {if (after (pakiet-> seq, EXP_SEQ) break; if (after (pakiet-> seq + pakiet-> len + pakiet-> fin, EXP_SEQ) {add_from_skb (a_tcp, rcv, snd, pakiet-> data, pakiet-> len, pakiet-> seq, pakiet-> fin, pakiet-> urg, pakiet-> urg_ptr + pakiet-> seq-1 );} rcv-> rmem_alloc-= pakiet-> truesize; if (pak Iet-> prev) pakiet-> prev-> next = pakiet-> next; else rcv-> list = pakiet-> next; if (pakiet-> next) pakiet-> next-> prev = pakiet-> prev; else rcv-> listtail = pakiet-> prev; tmp = pakiet-> next; free (pakiet-> data ); free (pakiet); pakiet = tmp ;}} else return ;}// it indicates that the package is arriving in disorder (the data start point exceeds EXP_SEQ ), put it in the buffer zone for processing. Note that else {struct skbuff * p = rcv-> listtail; pakiet = mknew (struct skbuff); pakiet-> truesize = s Kblen; rcv-> rmem_alloc + = pakiet-> truesize; pakiet-> len = datalen; pakiet-> data = malloc (datalen); if (! Pakiet-> data) nids_params.no_mem ("tcp_queue"); memcpy (pakiet-> data, data, datalen); pakiet-> fin = (this_tcphdr-> th_flags & TH_FIN ); /* Some Cisco-at least-hardware accept to close a TCP connection * even though packets were lost before the first tcp fin packet and * never retransmitted; this violates RFC 793, but since it really * happens, it has to be dealt... the idea is to introduce a 1 0 s * timeout after tcp fin packets were sent by both sides so that * corresponding libnids resources can be released instead of waiting * for retransmissions which will never happen. -- Sebastien Raveau */if (pakiet-> fin) {snd-> state = TCP_CLOSING; if (rcv-> state = FIN_SENT | rcv-> state = FIN_CONFIRMED) add_tcp_closing_timeout (a_tcp);} pakiet-> seq = this_seq; pakiet-> urg = (this_tcphdr-> th_f Lags & TH_URG); pakiet-> urg_ptr = ntohs (this_tcphdr-> th_urp); for (;) {if (! P |! After (p-> seq, this_seq) break; p = p-> prev;} if (! P) {pakiet-> prev = 0; pakiet-> next = rcv-> list; if (rcv-> list) rcv-> list-> prev = pakiet; rcv-> list = pakiet; if (! Rcv-> listtail) rcv-> listtail = pakiet;} else {pakiet-> next = p-> next; p-> next = pakiet; pakiet-> prev = p; if (pakiet-> next) pakiet-> next-> prev = pakiet; else rcv-> listtail = pakiet ;}}}

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More