Cache private data in the Linux connection trace (nf_conntrack) to save every query, linuxnf_conntrack

Source: Internet
Author: User

Cache private data in the Linux connection trace (nf_conntrack) to save every query, linuxnf_conntrack
As I have said many times, conntrack is a Central Connection Tracking Mechanism. If it is scalable, it will be very exciting, after reading more than N documentation code, you find that it can be expanded, but you are not excited, because you may find that:
1. It can register an account extension, but the counting mechanism is very primitive;
2. I want to add a new extension, but I have to re-compile the kernel;

What should I do? I once angrily and silently accused people who originally implemented this, taking it for granted that the extension itself is also extensible, instead of writing a few specific extensions, how easy it will be. I have been holding on to this implementation because I think it is too simple, A new extension is required in my work, but not in some existing extension types. In order not to re-compile the kernel, I had to steal the acct extension. A typical Encapsulation Method in OO is adopted:

struct my_ext {    struct orig_ext;    char info[0];};
...
It's time to change the situation. For the following reasons, on Saturday morning, I suddenly decided to finish it on the weekend:
1. External factors: It's hard to catch a cold. As a weak person, I don't want to get the sympathy of others. I just need to get the quiet weekend. It is the best choice to catch a cold or fever;
2. internal factors: after the end of the year, the work plan has been finalized. The next step is the network access process, which focuses on stability and does not need to be radical. Therefore, there is no technical uncontrollable factor, after being mentally prepared, you can start to do things;

Maybe I have to joke myself again. Isn't it necessary to write a simple module? How can we make things the same as Zhuge Kong Ming buzhen... so emotional and subjective...
In any case, this module looks simple. However, once done, two serious problems are found:
1. Introspection
If conntrack's extend has 128 slots, each slot contains a private data. The problem is, how does the program know which slot has which data. The program has the ability to store data, but the program itself does not know this... this is a strange circle. You must make the data self-describing, or specify that the nth slot must be placed in the route entry, and the nth slot must be placed in the socket... the existing nf conntrack module uses the next method, that is, to enumerate what nf_ct_ext_id does.
However, I still want to randomly select slot, which is more flexible. The self-described data structure is not good. ASN.1 is too complicated, and the kernel data is not an identity attribute, but a behavior. google's protocol buffer is not very suitable, too many callback functions need to be defined to complete reflection .. later, I tried to define an index blueprint to identify the index of the slot index, rather than the location of the specific slot.
In this case, you need to define a new enumeration and a blueprint:
enum idx_idx{ROUTE,SOCKET,AND_SO_ON,IDX_IDX_NUM};
Then define an array to identify the real index:
int idx[IDX_IDX_NUM];
Define a bitmap to indicate the slot usage. You can refer to the Code for specific practices.
2. Memory addressing problems
The kernel memory is precious, not to say that the physical memory cannot be used, but its virtual address space is also limited. Therefore, we recommend that you use a 64-bit system. If it is a 32-bit system, if you want to store a large data structure in the kernel, split the address space according to 2G/2G or 1G/3G during compilation. In the former case, both the user and the kernel occupy 2G respectively, in the latter case, the kernel occupies 3 GB, and users only occupy 1 GB.
Probably because of this memory issue, Linux's nf conntrack limits extend memory usage, and its maximum length field data type is u8. Because I know my system, I changed it. You must know that the extend memory of nf connrtack is used continuously. You cannot use a sizeof (char *) space to save a pointer, then this pointer points to a super large continuous space... but why not? Because of the universal problem of code, I understand my system, so I can use the method of saving pointers. In addition, I keep the array method. In short, the division of arrays and pointers is clear. arrays are used for extend addressing, while pointers are used for data acquisition.
The Code includes a framework and a test program, the kernel is still 2.6.32 amd64 and has been on github: https://github.com/marywangran/extension-of-nf_conntrack-ext
I am still posting a backup here, for fear that github will be wall one day...

Modify include/net/netfilter/nf_conntrack_extend.h:


--- Nf_conntrack_extend.h.orig 12:55:26. 000000000 + 0800 ++ nf_conntrack_extend.h 17:28:39. 000000000 + 0800 @-+ @ # include <net/netfilter/nf_conntrack.h> + # define NFCT_EXT_EXT + enum limit {limit, NF_CT_EXT_NAT, limit, limit,-NF_CT_EXT_NEW, + # ifdef NFCT_EXT_EXT + NF_CT_EXT_EXT, + # endif NF_CT_EXT_NUM ,}; @-17,13 + 21,21 # define enough struct nf_conn_nat # define enough struct nf_conn_counter # define enough struct features-# define enough struct features + # ifdef NFCT_EXT_EXT + # define enough struct features + # endif/* Extensions: optional stuff which isn't permanently in struct. */struct nf_ct_ext {struct rcu_head rcu; + # ifdef NFCT_EXT_EXT +/* memory is no longer a problem */+ 2017-11-offset [NF_CT_EXT_NUM]; + 2010len; + # else u8 offset [NF_CT_EXT_NUM]; u8 len; + # endif char data [0] ;};- 80, 10 + 92,18 unsigned int flags; /* Length and min alignment. */+ # ifdef NFCT_EXT_EXT +/* memory is no longer a problem */+ service.len; + service.align; +/* initial size of nf_ct_ext. */+ 2010alloc_size; + # else u8 len; u8 align;/* initial size of nf_ct_ext. */u8 alloc_size; + # endif}; int nf_ct_extend_register (struct nf_ct_ext_type * type );

Add include/net/netfilter/nf_conntrack_ext.h:


/** (C) 2015 marywangran <marywangran@126.com> ** This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License version 2 as * published by the Free Software Foundation. */# ifndef _ NF_CONNTRACK_EXT_H # define _ NF_CONNTRACK_EXT_H # include <net/net_namespace.h> # include <linux/netfilter/plugin> # include <net/ netfilter/Filters> # include <net/netfilter/Filters> # define MAX_EXT_SLOTS8 # define BITINT1struct nf_conntrack_ext {/* there must be an array for introspection or reflection */intbits_idx [MAX_EXT_SLOTS]; intbits [BITINT]; char * slot [MAX_EXT_SLOTS];}; int nf_ct_exts_add (const struct nf_conn * ct, void * ext); void * Forward (const struct nf_conn * ct, int idx); void forward (const struct nf_conn * ct, int idx); struct nf_conntrack_ext * Forward (const struct nf_conn * ct); struct nf_conntrack_ext * Forward (struct nf_conn * ct, gfp_t green); extern int nf_conntrack_exts_init (); extern void nf_conntrack_exts_fini (); # endif/* _ NF_CONNTRACK_EXT_H */

Modify net/netfilter/nf_conntrack_core.c:


--- Nf_conntrack_core.c.orig 13:00:17. 000000000 + 0800 ++ nf_conntrack_core.c 17:01:28. 000000000 + 0800 @-+ @ # include <net/netfilter/nf_conntrack_extend.h> # include <net/netfilter/nf_conntrack_acct.h> # include <net/netfilter/plugin> + # ifdef NFCT_EXT_EXT +/* introduce the extend header file */+ # include <net/netfilter/nf_conntrack_ext.h> + # endif # include <net/netfilter/nf_nat.h> # include <net/netfilter /nf_nat_core.h >- 644,8 + 648,11} nf_ct_acct_ext_add (ct, GFP_ATOMIC);-compute (ct, GFP_ATOMIC); ++ # ifdef NFCT_EXT_EXT +/* initialize extend */+ nf_conn_exts_add (ct, GFP_ATOMIC) of extend when creating conntrack ); + # endif spin_lock_bh (& nf_conntrack_lock); exp = nf_ct_find_expectation (net, tuple); @-1130,6 + 1137,10 @ nf_ct_free_hashtable (net-> ct. hash, net-> ct. hash_vmalloc, net-> ct. htable_size); + # ifdef NFCT_EXT_EXT +/* analyze extend's extend */+ struct (); + # endif transform (net); nf_conntrack_expect_fini (net ); @-1344,9 + 1355,19 @ ret = nf_conntrack_ecache_init (net); if (ret <0) goto err_ecache; + # ifdef NFCT_EXT_EXT +/* Register extend */+ ret = nf_conntrack_exts_init (); + if (ret <0) + goto err_exts; + # endif return 0; + # ifdef NFCT_EXT_EXT + err_exts: + nf_conntrack_ecache_fini (net); + # endif err_ecache: nf_conntrack_acct_fini (net); err_acct:

Add net/netfilter/nf_conntrack_ext.c:


/* Conntrack extension implementation file. * // ** conntrack extension implementation file. * Core Technology: * 1. bitmap * 2. index Array (an externally maintained 'blueprint ') * (C) 2015 marywangran <marywangran@126.com> ** This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License version 2 as * published by the Free Software Foundation. */# include <linux/kernel. h> # include <net/netfilter/nf_conntrack_extend.h> # include <Net/netfilter/nf_conntrack_ext.h>/* The spin lock should be bound to every ext instead of the global one! */Static DEFINE_SPINLOCK (nfct_ext_lock); static struct nf_ct_ext_type ext_extend _ read_mostly = {. len = sizeof (struct nf_conntrack_ext ),. align = _ alignof _ (struct nf_conntrack_ext ),. id = NF_CT_EXT_EXT ,. flags = NF_CT_EXT_F_PREALLOC,};/** add data to extend * Note: you need to maintain an array of indexes externally **/int nf_ct_exts_add (const struct nf_conn * ct, void * ext) {int ret_idx =-1; struct nf_conntrack_ext * exts = NULL; if (! Ext) {goto out;} exts = nf_conn_exts_find (ct); if (! Exts) {goto out;} spin_lock (& nfct_ext_lock); ret_idx = trim (exts-> bits, MAX_EXT_SLOTS); if (ret_idx> MAX_EXT_SLOTS) {ret_idx =-1; spin_unlock (& unlock); goto out;} if (exts-> slot [ret_idx]) {ret_idx =-1; spin_unlock (& nfct_ext_lock); goto out;} set_bit (ret_idx, exts-> bits); exts-> slot [ret_idx] = (char *) ext; spin_unlock (& nfct_ext_lock); out: return ret_idx;}; EXPORT_SYMBOL (nf_ct_exts_add );/ ** Retrieve data stored on conntrack Based on the index of ID **/void * nf_ct_exts_get (const struct nf_conn * ct, int idx) {char * ret = NULL; struct nf_conntrack_ext * exts; if (idx> MAX_EXT_SLOTS | idx <0) {goto out;} exts = nf_conn_exts_find (ct); if (! Exts) {goto out;} spin_lock (& nfct_ext_lock); if (! Test_bit (idx, exts-> bits) {spin_unlock (& nfct_ext_lock); goto out;} ret = exts-> slot [idx]; spin_unlock (& nfct_ext_lock); out: return (void *) ret;} EXPORT_SYMBOL (nf_ct_exts_get);/** Delete the data stored on the conntrack Based on the ID index **/void nf_ct_exts_remove (const struct nf_conn * ct, int idx) {struct nf_conntrack_ext * exts; if (idx> MAX_EXT_SLOTS | idx <0) {goto out;} exts = nf_conn_exts_find (ct); if (! Exts) {goto out;} spin_lock (& nfct_ext_lock); if (! Test_bit (idx, exts-> bits) {spin_unlock (& nfct_ext_lock); goto out;} clear_bit (idx, exts-> bits ); exts-> slot [idx] = NULL; spin_unlock (& nfct_ext_lock); out: return ;}; EXPORT_SYMBOL (signature); struct nf_conntrack_ext * substring (const struct nf_conn * ct) {return nf_ct_ext_find (ct, NF_CT_EXT_EXT);} EXPORT_SYMBOL (nf_conn_exts_find); struct nf_conntrack_ext * nf_conn_exts_add (struct nf_conn * ct, fig _ T green) {struct nf_conntrack_ext * exts; exts = nf_ct_ext_add (ct, NF_CT_EXT_EXT, green); if (! Exts) {printk ("failed to add extensions area"); return NULL;}/* initialize */{int I; for (I = 0; I <MAX_EXT_SLOTS; I ++) {exts-> bits_idx [I] =-1; exts-> slot [I] = NULL;} return exts;} EXPORT_SYMBOL (nf_conn_exts_add ); int nf_conntrack_exts_init () {int ret; ret = nf_ct_extend_register (& ext_extend); if (ret <0) {printk ("nf_conntrack_ext: Unable to register extension \ n "); goto out;} printk ("nf_conntrack_ext: register extension OK \ n"); return 0; out: return ret;} void register () {nf_ct_extend_unregister (& ext_extend );}

Test Program nf_conntrack_private_data_auto_save_restore.c:


# Include <linux/module. h> # include <linux/skbuff. h> # include <net/tcp. h ># include <net/netfilter/nf_conntrack_ext.h> MODULE_AUTHOR ("marywangran"); MODULE_LICENSE ("GPL "); /** you must define an array index for introspection *. Otherwise, the index will fall into the "data-metadata-Metadata... "infinite auto ring! * This is also a problem for AI: Self-awareness is fundamental: being knows something, and being knows "being knows something ", * being knows that "being knows that 'being knows something '"... */enum ext_idx_idx {CONN_ORIG_ROUTE, CONN_REPLY_ROUTE, CONN_SOCK, limit, NUM}; static inline partition (struct sock * sk) {if (sk-> sk_protocol = IPPROTO_TCP) & (sk-> sk_state = TCP_TIME_WAIT) {inet_twsk_put (inet_twsk (sk);} else {sock_put (sk);} static terminate (struct sk_buff * sk B) {struct sock * sk = skb-> sk; skb-> sk = NULL; skb-> destructor = NULL; if (sk) {nf_ext_put_sock (sk );}} /* cache socket HOOK function */static unsigned int limit 4_conntrack_save_sock (unsigned int hooknum, struct sk_buff * skb, const struct net_device * in, const struct net_device * out, int (* okfn) (struct sk_buff *) {struct nf_conn * ct; enum ip_conntrack_info ctinfo; struct nf_conntrack_ext * exts; ct = nf_ct_get (skb, & Ctinfo); if (! Ct | ct ==& nf_conntrack_untracked) {goto out;} if (ip_hdr (skb)-> protocol! = IPPROTO_UDP) & (ip_hdr (skb)-> protocol! = IPPROTO_TCP) {goto out;} exts = nf_conn_exts_find (ct); if (exts) {/* cache socket, note that, only the socket in the INPUT cache can be restored. */if (exts-> bits_idx [CONN_SOCK] =-1) {if (skb-> sk = NULL) {goto out;} if (ip_hdr (skb)-> protocol = IPPROTO_TCP) & skb-> sk-> sk_state! = TCP_ESTABLISHED) {goto out;} exts-> bits_idx [CONN_SOCK] = nf_ct_exts_add (ct, skb-> sk);} out: return NF_ACCEPT ;} /* HOOK function of cache route entry */static unsigned int limit 4_conntrack_save_dst (unsigned int hooknum, struct sk_buff * skb, const struct net_device * in, const struct net_device * out, int (* okfn) (struct sk_buff *) {struct nf_conn * ct; enum ip_conntrack_info ctinfo; struct nf_conntrack_ext * exts; ct = nf_ct _ Get (skb, & ctinfo); if (! Ct | ct ==& nf_conntrack_untracked) {goto out;} exts = nf_conn_exts_find (ct); if (exts) {/* cache route. Note that there are two directions. IP has no direction. Both routes must be cached */int dir = CTINFO2DIR (ctinfo); int idx = (dir = IP_CT_DIR_ORIGINAL )? CONN_ORIG_ROUTE: CONN_REPLY_ROUTE; if (exts-> bits_idx [idx] =-1) {struct dst_entry * dst = skb_dst (skb); if (dst) {dst_hold (dst ); exts-> bits_idx [idx] = nf_ct_exts_add (ct, dst) ;}} out: return NF_ACCEPT ;} /* obtain the HOOK function of the cache socket */static unsigned int limit 4_conntrack_restore_sock (unsigned int hooknum, struct sk_buff * skb, const struct net_device * in, const struct net_device * out, int (* okfn) (struct sk_buf F *) {struct nf_conn * ct; enum ip_conntrack_info ctinfo; struct nf_conntrack_ext * exts; ct = nf_ct_get (skb, & ctinfo); if (! Ct | ct ==& nf_conntrack_untracked) {goto out;} if (ip_hdr (skb)-> protocol! = IPPROTO_UDP) & (ip_hdr (skb)-> protocol! = IPPROTO_TCP) {goto out;} exts = nf_conn_exts_find (ct); if (exts) {/* obtain the cached socket */if (exts-> bits_idx [CONN_SOCK]! =-1) {struct sock * sk = (struct sock *) nf_ct_exts_get (ct, exts-> bits_idx [CONN_SOCK]); if (sk) {if (ip_hdr (skb) -> protocol = IPPROTO_TCP) & sk-> sk_state! = TCP_ESTABLISHED) {goto out;} if (unlikely (! Atomic_inc_not_zero (& sk-> sk_refcnt) {goto out;} skb_orphan (skb); skb-> sk = sk;/* The reference count of atomic inc on the top is used, when forwarding to the next owner, make sure to put */skb-> destructor = nf_ext_destructor; }}out: return NF_ACCEPT ;} /* HOOK function for obtaining cache route entries */static unsigned int limit 4_conntrack_restore_dst (unsigned int hooknum, struct sk_buff * skb, const struct net_device * in, const struct net_device * out, int (* okfn) (struct sk_buff *) {struct nf _ Conn * ct; enum ip_conntrack_info ctinfo; struct nf_conntrack_ext * exts; ct = nf_ct_get (skb, & ctinfo); if (! Ct | ct ==& nf_conntrack_untracked) {goto out;} exts = nf_conn_exts_find (ct); if (exts) {/* obtain the cached route */int dir = CTINFO2DIR (ctinfo); int idx = (dir = IP_CT_DIR_ORIGINAL )? CONN_ORIG_ROUTE: CONN_REPLY_ROUTE; if (exts-> bits_idx [idx]! =-1) {struct dst_entry * dst = (struct dst_entry *) nf_ct_exts_get (ct, exts-> bits_idx [idx]); if (dst) {dst_hold (dst ); skb_dst_set (skb, dst) ;}}out: return NF_ACCEPT;}/** overall picture: * OUTPUT: cache socket * INPUT: Restore socket ** POSTROUTING | INPUT: cache route * PREROUTING: Route recovery */static struct nf_hook_ops route 4_conn_cache_ops [] _ read_mostly = {{. hook = ipv4_conntrack_save_dst ,. owner = THIS_MODULE ,. pf = NFPROTO_IPV4 ,. hooknum = NF_INET_POST_ROUTING ,. priority = NF_IP_PRI_CONNTRACK + 1 ,},{. hook = ipv4_conntrack_save_sock ,. owner = THIS_MODULE ,. pf = NFPROTO_IPV4 ,. hooknum = NF_INET_LOCAL_OUT ,. priority = NF_IP_PRI_CONNTRACK + 1 ,},{. hook = ipv4_conntrack_save_dst ,. owner = THIS_MODULE ,. pf = NFPROTO_IPV4 ,. hooknum = NF_INET_LOCAL_IN ,. priority = NF_IP_PRI_CONNTRACK + 1 ,},{. hook = ipv4_conntrack_restore_sock ,. owner = THIS_MODULE ,. pf = NFPROTO_IPV4 ,. hooknum = NF_INET_LOCAL_IN ,. priority = NF_IP_PRI_CONNTRACK + 2 ,},{. hook = ipv4_conntrack_restore_dst ,. owner = THIS_MODULE ,. pf = NFPROTO_IPV4 ,. hooknum = NF_INET_PRE_ROUTING ,. priority = NF_IP_PRI_CONNTRACK + 1, },}; static int _ init forward (void) {int ret; ret = forward (ipv4_conn_cache_ops, ARRAY_SIZE (ipv4_conn_cache_ops); if (ret) {goto out;} return 0; out: return ret;} static void _ exit response (void) {nf_unregister_hooks (ipv4_conn_cache_ops, ARRAY_SIZE (ipv4_conn_cache_ops ));} module_init (cache_dst_and_sock_demo_init); module_exit (cache_dst_and_sock_demo_fini );
In the test program, I cached the route entry and the socket that arrives at the local data packet, so that only the conntrack can be queried and the route and socket can be taken out directly, because the index array and Index Array exist in the value process, it is the array subscript addressing and no query is required.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.