Cache private data in Linux connection tracking (nf_conntrack) to eliminate each lookup

Source: Internet
Author: User
Tags goto

Many times before, Conntrack as a link tracking mechanism, if it is extensible, then it will be a very exciting thing, when you read the n multi-document code, you find that it is indeed extensible, but not excited, because you may find:
1. It can register an account extension, but the counting mechanism is primitive;
2. I want to add a new extension, but have to recompile the kernel;

What to do? I used to be angry to accuse the people who had realized it, and to assume that the extension itself would be extensible, rather than how easy it would be to write a few specific extensions, and I kept trying not to do it because I thought it was too simple and really needed a new extension at work. However, there are no existing extension types, so I had to steal the acct extension in order not to recompile the kernel. A typical encapsulation approach in OO is used:
struct My_ext {    struct orig_ext;    Char info[0];};
...
It's time to change some things. For several reasons, on the morning of Saturday, I suddenly decided to finish it on the weekends:
1. External factors: Finally cold, as a weak person, I do not want to get sympathy from others, only need to get the weekend quiet, cold fever is the best choice;
2. Internal factors: The end of the year, the work plan has been determined, the back is a net process, stability, not need to be too radical, so there is no technically controllable factors, psychological security, things can start to do;

Maybe I'll laugh at myself again, don't write a simple module? How to make the same as the Zhuge Ming ... So sensual and subjective how can one ...
Anyway, this module does look simple. Once done, however, there are two more serious problems:
1. Reflection Introspection Problem
If the Conntrack extend has 128 slots, a private data is placed inside each slot. The question is how the program knows which data is in which slot. The program has the ability to store, but the program itself does not know this point ... This is a vicious circle, you have to make the data self-describing, or the provision of dead nth slot must be placed in the route, the first m slot must be placed socket ... The existing NF Conntrack module uses the latter method, which enumerates what nf_ct_ext_id does.
But I still want to randomly select slots, which is more flexible. Self-described data structure also looked bad, ASN.1 is too complex, and the kernel data is more than the identity attribute, but the definition of a behavior, Google protocol buffer is not very appropriate, you need to define too many callback functions to complete reflection introspection. Then I thought of a way to define an index blueprint that identifies the "index of the slot index" instead of identifying the location of the specific slot.
This requires defining a new enumeration that defines the blueprint:
Enum Idx_idx{route,socket,and_so_on,idx_idx_num};
Then define an array to identify the true index:
int Idx[idx_idx_num];
Define a bitmap to represent the use of the slot, the specific way you can see the code, one eye to know.
2. Memory addressing issues
Kernel memory is valuable, not to say that the physical memory can not afford, but its virtual address space is also limited, it is recommended to use 64-bit system, if it is a 32-bit system, if you want the kernel to save a larger data structure, at compile time according to 2G/2G or 1g/3g to split the address space, The former case the user and the kernel each occupy 2G, the latter the kernel occupies 3G, the user only accounted for 1G.
Perhaps because of this memory problem, the Linux NF Conntrack limits extend memory usage, and its Maximum length field data type is U8. Since I know my system, I changed it to U16. What you have to know is that NF Connrtack extend memory is used continuously, you cannot hold a pointer in a space of sizeof (char *) size, then this pointer points to a super large continuous space ... But why not? Or because of the pervasive problem of code, I understand my system, so I can use the practice of saving pointers. In addition I have retained the way of arrays, in short, array and pointer division is clear, arrays are used for extend addressing, and pointers are used for data acquisition.
The code includes a framework and a test program, the kernel or 2.6.32 AMD64, already on GitHub: Https://github.com/marywangran/extension-of-nf_conntrack-ext
Or put a backup here, afraid of the day GitHub was wall ...

Modify Include/net/netfilter/nf_conntrack_extend.h:


---nf_conntrack_extend.h.orig 2014-03-29 12:55:26.000000000 +0800+++ nf_conntrack_extend.h 2015-01-15 17:28:39.000000000 +0800@@ -3,13 +3,17 @@ zzfcthotfixz <net/netfilter/nf_conntrack.h>+ #define nfct_ext_ext+ enum nf_ ct_ext_id {nf_ct_ext_helper, Nf_ct_ext_nat, Nf_ct_ext_acct, nf_ct_ext_ecache,-nf_ct_ext_new,+ #ifdef NFCT_E xt_ext+ nf_ct_ext_ext,+ #endif nf_ct_ext_num,};@@ -17,13 +21,21 @@ zzfcthotfixz nf_ct_ext_nat_type struct Nf_conn_nat #def ine nf_ct_ext_acct_type struct nf_conn_counter #define NF_CT_EXT_ECACHE_TYPE struct nf_conntrack_ecache-#define NF_CT_ Ext_new_type struct nf_conntrack_new+ #ifdef nfct_ext_ext+ #define NF_CT_EXT_EXT_TYPE struct nf_conntrack_ext+ #endif/* Extensions:optional stuff which isn ' t permanently in struct. */struct Nf_ct_ext {struct Rcu_head rcu;+ #ifdef nfct_ext_ext+/* Memory is no longer a thing */+ U16 offset[nf_ct_ext_num];+ U16 L    en;+ #else U8 Offset[nf_ct_ext_num]; U8 len;+ #endif Char data[0]; };@@ -80,10 +92,18 @@ unsigned intFlags /* Length and min alignment. */+ #ifdef nfct_ext_ext+/* Memory is no longer a thing */+ U16 len;+ U16 align;+/* Initial size of Nf_ct_ext. */+ U16 alloc_size;+    #else U8 Len;    U8 Align; /* Initial size of nf_ct_ext. */U8 alloc_size;+ #endif}; int Nf_ct_extend_register (struct nf_ct_ext_type *type);

Add Include/net/netfilter/nf_conntrack_ext.h:


/* * (C) Marywangran <[email protected]> * * This program is free software; Can redistribute it and/or modify * it under the terms of the GNU general public License version 2 as * published by T He free software Foundation. */#ifndef _nf_conntrack_ext_h#define _nf_conntrack_ext_h#include <net/net_namespace.h> #include <linux/ netfilter/nf_conntrack_common.h> #include <linux/netfilter/nf_conntrack_tuple_common.h> #include <net/ netfilter/nf_conntrack.h> #include <net/netfilter/nf_conntrack_extend.h> #define Max_ext_slots8#define Bitint1struct Nf_conntrack_ext {/* must have an array for introspection or reflection */intbits_idx[max_ext_slots];intbits[bitint];char *slot[MAX_EXT_ SLOTS];}; int Nf_ct_exts_add (const struct nf_conn *ct, void *ext), void *nf_ct_exts_get (const struct NF_CONN *ct, int idx); void Nf_ct_exts_remove (const struct NF_CONN *ct, int idx), struct nf_conntrack_ext *nf_conn_exts_find (const struct NF_ conn *ct); struct Nf_conntrack_ext *nf_conn_exts_add (struct nf_conn *ct, gfp_t GFP), extern int nf_conntrack_exts_init (), extern void Nf_conntrack_exts_fini (), #endif/* _nf_conntrack_ext_h */ 

Modify NET/NETFILTER/NF_CONNTRACK_CORE.C:


---nf_conntrack_core.c.orig 2014-03-29 13:00:17.000000000 +0800+++ nf_conntrack_core.c 2015-01-15 17:01:28.000000000 +0800@@ -42,6 +42,10 @@ zzfcthotfixz <net/netfilter/nf_conntrack_extend.h> #include <net/netfilter/nf_conntrack_ Acct.h> #include <net/netfilter/nf_conntrack_ecache.h>+ extend extend header file */+# #ifdef nfct_ext_ext+/* Include <net/netfilter/nf_conntrack_ext.h>+ #endif #include <net/netfilter/nf_nat.h> #include <net/ netfilter/nf_nat_core.h>@@ -644,8 +648,11 @@} nf_ct_acct_ext_add (CT, gfp_atomic);-Nf_ct_ecache_ext_add (CT, G fp_atomic); + #ifdef nfct_ext_ext+/* Initialize extend extend */+ nf_conn_exts_add (CT, gfp_atomic) when creating conntrack; + #endif sp    IN_LOCK_BH (&nf_conntrack_lock); Exp = nf_ct_find_expectation (NET, tuple); @@ -1130,6 +1137,10 @@ -1130,6 (Net->ct.hash, Net->ct.hash_ Vmalloc, net->ct.htable_size); + #ifdef nfct_ext_ext+/* destructor extend extend */+ Nf_conntrack_exts_fini () ; + #endif NF_conntrack_ecache_fini (NET);    Nf_conntrack_acct_fini (NET);    Nf_conntrack_expect_fini (NET); @@ -1344,9 +1355,19 @@ -1344,9 = nf_conntrack_ecache_init (NET);   if (Ret < 0) goto err_ecache;+ #ifdef nfct_ext_ext+/* Register extend extend */+ ret = Nf_conntrack_exts_init (); + if (Ret < 0) + goto err_exts;+ #endif return 0;+ #ifdef nfct_ext_ext+err_exts:+ nf_conntrack_ecache_fini (NET); + #endif Err_ecache:nf_conntrack_acct_fini (net); ERR_ACCT:

Add Net/netfilter/nf_conntrack_ext.c:


/* Extension implementation file for Conntrack extension. Extension implementation file for *//* * conntrack extension. * Technical Core: * * Bitmap * * Index Array (a ' blueprint ' for externally maintained) * (C) Marywangran <[email protected]> * * This program was free SOFTW is; Can redistribute it and/or modify * it under the terms of the GNU general public License version 2 as * published by T He free software Foundation. * * #include <linux/kernel.h> #include <net/netfilter/nf_conntrack_extend.h> #include <net/netfilter/ nf_conntrack_ext.h>/* This spin lock should be tied to every ext instead of global! */static Define_spinlock (nfct_ext_lock); static struct Nf_ct_ext_type ext_extend __read_mostly = {. len= sizeof (struct NF _conntrack_ext),. align= __alignof__ (struct nf_conntrack_ext),. id= nf_ct_ext_ext,.flags= nf_ct_ext_f_prealloc,};/* * Add a data to extend's extend * Note: You need to maintain an array of indexes on the external **/int nf_ct_exts_add (const struct nf_conn *ct, void *ext) {int ret_idx =- 1;struct Nf_conntrack_ext *exts = null;if (!ext) {goto out;} exts = Nf_conn_exts_find (CT); if (!exts) {goto out;} Spin_lock (&nfct_ext_lock); ret_idx = Find_first_zero_bit (Exts->bits, Max_ext_slots), if (Ret_idx > max_ext_slots) {ret_idx = -1;spin_unlock (& Nfct_ext_lock); goto out;} if (Exts->slot[ret_idx]) {ret_idx = -1;spin_unlock (&nfct_ext_lock); goto out;} Set_bit (Ret_idx, exts->bits); Exts->slot[ret_idx] = (char *) ext;spin_unlock (&nfct_ext_lock); Out:return ret _idx;}; Export_symbol (Nf_ct_exts_add); */* Gets the data stored on the Conntrack based on the index of the ID **/void *nf_ct_exts_get (const struct NF_CONN *ct, int IDX) {char *ret = null;struct nf_conntrack_ext *exts;if (idx > Max_ext_slots | | idx < 0) {goto out;} exts = Nf_conn_exts_find (CT); if (!exts) {goto out;} Spin_lock (&nfct_ext_lock); if (! test_bit (idx, exts->bits)) {spin_unlock (&nfct_ext_lock); goto out;} ret = Exts->slot[idx];spin_unlock (&nfct_ext_lock); Out:return (void *) ret;} Export_symbol (nf_ct_exts_get); */* Delete data saved on Conntrack based on the index of the ID **/void nf_ct_exts_remove (const struct Nf_conn *ct, int idx) {struct Nf_conntrack_ext *exts;if (idx > Max_ext_slots | | idx < 0) {goto out;} exts = Nf_conn_exts_find (CT); if (!exts) {goto out;} Spin_lock (&nfct_ext_lock); if (! test_bit (idx, exts->bits)) {spin_unlock (&nfct_ext_lock); goto out;} Clear_bit (idx, exts->bits); Exts->slot[idx] = Null;spin_unlock (&nfct_ext_lock); out:return;}; Export_symbol (nf_ct_exts_remove), struct nf_conntrack_ext *nf_conn_exts_find (const struct Nf_conn *ct) {return nf_ct_ Ext_find (CT, nf_ct_ext_ext);} Export_symbol (nf_conn_exts_find); struct Nf_conntrack_ext *nf_conn_exts_add (struct nf_conn *ct, gfp_t GFP) {struct NF_ Conntrack_ext *exts;exts = nf_ct_ext_add (CT, nf_ct_ext_ext, GFP); if (!exts) {PRINTK ("Failed to add extensions area"); Retu RN NULL;} /* Initialize */{int i;for (i = 0; i < max_ext_slots; i++) {exts->bits_idx[i] = -1;exts->slot[i] = NULL;}} return exts;} Export_symbol (nf_conn_exts_add); int nf_conntrack_exts_init () {int ret;ret = Nf_ct_extend_register (&ext_extend); if (Ret < 0) {PRINTK ("nf_conntrack_ext:unable to register extension\n"); goto out;} PRINTK ("Nf_conntrack_ext:register extension ok\n "); return 0;out:return ret;} void Nf_conntrack_exts_fini () {nf_ct_extend_unregister (&ext_extend);}

Test program nf_conntrack_private_data_auto_save_restore.c:


#include <linux/module.h> #include <linux/skbuff.h> #include <net/tcp.h> #include <net/netfilter  /nf_conntrack_ext.h> Module_author ("Marywangran");  Module_license ("GPL"); /* * must define an array index for introspection * Otherwise you will fall into the "data-metadata-meta-metadata-HK data ..." Infinite self-reference circle! * This is also the problem that AI faces: self-awareness is fundamental: being know something, and being know "being know something", * and being know "being know ' being know something '" ... */enum ext_idx_idx {CONN _orig_route,conn_reply_route,conn_sock, conn_and_so_on, num};static inline voidnf_ext_put_sock (struct SOCK *sk) {if ( Sk->sk_protocol = = ipproto_tcp) && (sk->sk_state = = tcp_time_wait)) {inet_twsk_put (INET_TWSK (SK));} else {sock_put (SK);}} Static voidnf_ext_destructor (struct Sk_buff *skb) {struct sock *sk = Skb->sk;skb->sk = Null;skb->destructor = NUL L;if (SK) {nf_ext_put_sock (SK);}}                                        /* Cache socket's hook function */static unsigned int ipv4_conntrack_save_sock (unsigned int hooknum,     struct Sk_buff *skb, const struct Net_device *in,                                   const struct Net_device *out, int (*OKFN) (struct  Sk_buff *) {struct nf_conn *ct;  Enum Ip_conntrack_info Ctinfo;  struct Nf_conntrack_ext *exts;ct = Nf_ct_get (SKB, &ctinfo); if (!ct | | ct = = &nf_conntrack_untracked) {goto out;} if (IP_HDR (SKB)->protocol! = ipproto_udp) && (IP_HDR (SKB)->protocol! = ipproto_tcp)) {goto out;} exts = Nf_conn_exts_find (CT); if (exts) {/* cache socket, note that only the recovery cache socket of input has a larger meaning */if (exts->bits_idx[conn_sock] = =-1) {if (Skb->sk = = NULL) {goto out;} if ((Ip_hdr (SKB)->protocol = = ipproto_tcp) && skb->sk->sk_state! = tcp_established) {goto out;} Exts->bits_idx[conn_sock] = nf_ct_exts_add (CT, Skb->sk);}} Out:return nf_accept;}                                        /* The hook function of the cache route entry */static unsigned int ipv4_conntrack_save_dst (unsigned int hooknum,     struct Sk_buff *skb, const struct Net_device *in,                                   const struct Net_device *out, int (*OKFN) (struct  Sk_buff *) {struct nf_conn *ct;  Enum Ip_conntrack_info Ctinfo;  struct Nf_conntrack_ext *exts;ct = Nf_ct_get (SKB, &ctinfo);     if (!ct | | ct = = &nf_conntrack_untracked) {goto out; }exts = Nf_conn_exts_find (CT); if (exts) {/* cache route. Note that there are two directions.  IP no direction, two directional routes are to be cached */int dir = Ctinfo2dir (Ctinfo); int idx = (dir = = ip_ct_dir_original)? Conn_orig_route:conn_reply_route;if (Exts->bits_idx[idx] = = 1) {struct Dst_entry *dst = SKB_DST (SKB); if (DST) {Dst_ Hold (DST); EXTS-&GT;BITS_IDX[IDX] = Nf_ct_exts_add (CT, DST);}}  } Out:return nf_accept;                                        }/* Gets the cache socket's hook function */static unsigned int ipv4_conntrack_restore_sock (unsigned int hooknum,                                        struct Sk_buff *skb, const struct Net_device *in,                                  const struct Net_device *out,      Int (*OKFN) (struct Sk_buff *)) {struct Nf_conn *ct;  Enum Ip_conntrack_info Ctinfo;  struct Nf_conntrack_ext *exts;ct = Nf_ct_get (SKB, &ctinfo); if (!ct | | ct = = &nf_conntrack_untracked) {goto out;} if (IP_HDR (SKB)->protocol! = ipproto_udp) && (IP_HDR (SKB)->protocol! = ipproto_tcp)) {goto out;} exts = Nf_conn_exts_find (CT); if (exts) {/* Gets the cached socket */if (Exts->bits_idx[conn_sock]! =-1) {struct SOCK *sk = (str UCT sock *) Nf_ct_exts_get (CT, exts->bits_idx[conn_sock]); if (SK) {if (IP_HDR (SKB)->protocol = = ipproto_tcp) && sk->sk_state! = tcp_established) {goto out;} if (Unlikely (!atomic_inc_not_zero (&sk->sk_refcnt))) {goto out;} Skb_orphan (SKB); Skb->sk = sk;/* Once on the above Atomic Inc reference count, when transferred to the next owner, be sure to put */skb->destructor = Nf_ext_ destructor;}}}  Out:return nf_accept;}                                        /* Fetch the cache route entry's hook function */static unsigned int ipv4_conntrack_restore_dst (unsigned int hooknum,             struct Sk_buff *skb,                           const struct Net_device *in, const struct Net_device *ou  T, Int (*OKFN) (struct Sk_buff *)) {struct Nf_conn *ct;  Enum Ip_conntrack_info Ctinfo;  struct Nf_conntrack_ext *exts;ct = Nf_ct_get (SKB, &ctinfo); if (!ct | | ct = = &nf_conntrack_untracked) {goto out;}  exts = Nf_conn_exts_find (CT); if (exts) {/* Gets the cached route */int dir = ctinfo2dir (Ctinfo); int idx = (dir = = ip_ct_dir_original)? Conn_orig_route:conn_reply_route;if (EXTS-&GT;BITS_IDX[IDX]! =-1) {struct Dst_entry *dst = (struct dst_entry *) nf_ct_  Exts_get (CT, exts->bits_idx[idx]); if (DST) {dst_hold (DST); Skb_dst_set (SKB, DST);}}  } Out:return nf_accept; }/* * Overall Picture: * OUTPUT: Cache socket * INPUT: Restore SOCKET * * postrouting| INPUT: Cache route * prerouting: Restore route */static struct Nf_hook_ops ipv4_conn_cache_ops[] __read_mostly = {. Hook = IPV 4_CONNTRACK_SAVE_DST,. Owner = This_module,. PF = Nfproto_ipv4,. Hooknum = nf_inet_post_routing,. Priority = Nf_ip_pri_conntrack + 1,}, {. Hook = Ipv4_conn Track_save_sock,. Owner = This_module,. PF = Nfproto_ipv4,. hooknum = Nf_inet_local_out,. Priority = Nf_ip_pri_conntrack + 1,}, {. Hook = ipv4_conntrack_save_dst,. Owner = This_modul  E,. PF = Nfproto_ipv4,. hooknum = nf_inet_local_in,. Priority = Nf_ip_pri_conntrack + 1,},{         . Hook = Ipv4_conntrack_restore_sock,. Owner = This_module,. PF = Nfproto_ipv4,. Hooknum = Nf_inet_local_in,. Priority = Nf_ip_pri_conntrack + 2,},{. Hook = ipv4_conntrack_restore_dst       ,. Owner = This_module,. PF = Nfproto_ipv4,. hooknum = nf_inet_pre_routing,. Priority    = Nf_ip_pri_conntrack + 1,},};  static int __init cache_dst_and_sock_demo_init (void) {int ret; ret = Nf_register_hooks (ipv4_conn_cache_ops, Array_size (ipv4_conn_cache_ops));  if (ret) {goto out;;  }return 0;out:return ret; } static void __exit Cache_dst_and_sock_demo_fini (void) {Nf_unregister_hooks (Ipv4_conn_cache_ops, Array_size (ipv4_co  Nn_cache_ops));  } module_init (Cache_dst_and_sock_demo_init);   Module_exit (Cache_dst_and_sock_demo_fini);
in the test program, I cache the route entry and the socket that arrives at the native packet, so that only the query to Conntrack can directly take the route and socket out, the process of the value is indexed array and index array, so is the array subscript address, no longer need to query.

Cache private data in Linux connection tracking (nf_conntrack) to eliminate each lookup

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.