IPVS Load Balancing Algorithm Based on HASH value of any offset field in the Application Layer

Source: Internet
Author: User

IPVS Load Balancing Algorithm Based on HASH value of any offset field in the Application Layer
In earlier years, I wrote a Server Load balancer Scheduling Algorithm Module to calculate a value based on a piece of fixed-length data starting from any offset of the application layer protocol package, then, hash the value to different servers. At that time, I thought it was useless and I didn't continue until I thought about it for a while and a pre-study a few days ago. I decided to write a text to remember it. It may be useful in the future.
1. UDP Server Load balancer previously used very few UDP services. Although HTTP does not necessarily mean TCP, there is almost no HTTP over UDP. However, with the increase of network reliability, the centralized network control mechanism and distributed optimization technology become increasingly mature, and more scenarios are using UDP.
2. mobile network problems if you use a mobile phone or PAD to access the service, because these mobile terminals are always on the move, their IP addresses will also change constantly (please do not consider LISP, this is just an ideal ), if TCP is used as the bearer protocol of the Service, it means that TCP will be continuously disconnected and reconnected-TCP and IP are related. If UDP is used, this problem will not occur, the cost is to record the connection information at the application layer. This is a problem with the absence of the Session Layer. Although some people do not agree with it, it is useless to say that the keyboard party does not use this mechanism. In view of this, I performed an operation on OpenVPN.
3. Based on the sessionID Of The UDP Application Layer for Load Balancing step by step, we have come here and now we have to answer the question. What is sessionID? It is not part of the standard protocol. First, you must ensure that this field is included in the data packet. This is generally guaranteed. I certainly know what I am configuring. Second, the problem is where the sessionID is located? This must not be enforced. In fact, the so-called sessionID is the part that will not change in the data packet during a connection. Therefore, the best way is to let the configurator decide where it is and how long it is.
Net/netfilter/S/ip_vs_offh.c:

 

 

/** S: Layer7 payload Hashing scheduling module ** Authors: ZHAOYA * is modified based on ip_vs_sh/dh. For more information, see: * net/netfilter/ipvs/ip_vs_sh.c * net/netfilter/ipvs/ip_vs_dh.c */# include <linux/ip. h> # include <linux/tcp. h> # include <linux/module. h> # include <linux/kernel. h> # include <linux/skbuff. h> # include <linux/ctype. h> # include <net/ip. h ># include <net/ip_vs.h> struct ip_vs_offh_bucket {struct ip_vs_dest * dest; }; Struct ip_vs_offh_data {struct ip_vs_offh_bucket * tbl; u32 offset; u32 offlen ;}; # define limit 8 # define limit (1 <strong) # define limit (IP_VS_OFFH_TAB_SIZE-1) /** global variable * offset: Layer7 calculates the payload offset of the hash value (relative to the Layer7 header) * offlen: Layer7 calculates the payload length of the hash value */static u32 offset, offlen; static int skip_atoi (char ** s) {int I = 0; while (isdigit (** s) I = I * 10 + * (* s) ++)-'0'; return I;} static inline struct ip_vs_dest * ip_vs_offh_get (struct ip_vs_offh_bucket * tbl, const char * payload, u32 length) {_ be32 v_fold = 0;/* algorithm to be optimized */v_fold = (payload [0] ^ payload [length> 2] ^ payload [length]) * 2654435761UL; return (tbl [v_fold & IP_VS_OFFH_TAB_MASK]). dest;} static intip_vs_offh_assign (struct ip_vs_offh_bucket * tbl, struct ip_vs_service * svc) {int I; stru Ct ip_vs_offh_bucket * B; struct list_head * p; struct ip_vs_dest * dest; B = tbl; p = & svc-> destinations; for (I = 0; I <IP_VS_OFFH_TAB_SIZE; I ++) {if (list_empty (p) {B-> dest = NULL;} else {if (p = & svc-> destinations) p = p-> next; dest = list_entry (p, struct ip_vs_dest, n_list); atomic_inc (& dest-> refcnt); B-> dest = dest; p = p-> next;} B ++;} return 0;} static void ip_vs_offh_flush (struct ip_vs_of Fh_bucket * tbl) {int I; struct ip_vs_offh_bucket * B; B = tbl; for (I = 0; I <IP_VS_OFFH_TAB_SIZE; I ++) {if (B-> dest) {atomic_dec (& B-> dest-> refcnt); B-> dest = NULL;} B ++ ;}} static int ip_vs_offh_init_svc (struct ip_vs_service * svc) {struct ip_vs_offh_data * pdata; struct limit * tbl; pdata = kmalloc (sizeof (struct ip_vs_offh_data), GFP_ATOMIC); if (pdata = NULL) {pr_err ("% s (): no memory \ n ", _ Func _); return-ENOMEM;} tbl = kmalloc (sizeof (struct ip_vs_offh_bucket) * IP_VS_OFFH_TAB_SIZE, GFP_ATOMIC); if (tbl = NULL) {kfree (pdata); pr_err ("% s (): no memory \ n", _ func _); return-ENOMEM;} pdata-> tbl = tbl; pdata-> offset = 0; pdata-> offlen = 0; svc-> sched_data = pdata; ip_vs_offh_assign (tbl, svc); return 0 ;} static int ip_vs_offh_done_svc (struct ip_vs_service * svc) {struct ip_vs_offh _ Data * pdata = svc-> sched_data; struct ip_vs_offh_bucket * tbl = pdata-> tbl; ip_vs_offh_flush (tbl); kfree (pdata); return 0 ;} static int enumerate (struct ip_vs_service * svc) {struct ip_vs_offh_bucket * tbl = svc-> sched_data; ip_vs_offh_flush (tbl); ip_vs_offh_assign (tbl, svc); return 0 ;} static inline int is_overloaded (struct ip_vs_dest * dest) {return dest-> flags & IP_VS_DEST_F_OV ERLOAD;} static struct ip_vs_dest * struct (struct ip_vs_service * svc, const struct sk_buff * skb) {struct ip_vs_dest * dest; struct limit * pdata; struct limit * tbl; struct iphdr * iph; void * transport_hdr; char * payload; u32 hdrlen = 0; u32 _ offset = 0; u32 _ offlen = 0; iph = ip_hdr (skb ); hdrlen = iph-> ihl * 4; if (hdrlen> skb-> len) {return NULL;} transport_hdr = (vo Id *) iph + hdrlen; switch (iph-> protocol) {case IPPROTO_TCP: hdrlen + = (struct tcphdr *) transport_hdr)-> doff) * 4; break; case IPPROTO_UDP: hdrlen + = sizeof (struct udphdr); break; default: return NULL ;}# if 0 {int I = 0; _ offset = offset; _ offlen = offlen; payload = (char *) iph + hdrlen + _ offset; printk ("begin: iplen: % d \ n", hdrlen); for (I = 0; I <_ offlen; I ++) {printk ("% 02X", payload [I]);} Printk ("\ nend \ n"); return NULL ;}# endif pdata = (struct ip_vs_offh_datai *) svc-> sched_data; tbl = pdata-> tbl; _ offset = offset; // pdata-> offset; _ offlen = offlen; // pdata-> offlen; if (_ offlen + _ offset> skb-> len-hdrlen) {IP_VS_ERR_RL ("OFFH: exceed \ n "); return NULL;} payload = (char *) iph + hdrlen + _ offset; dest = ip_vs_offh_get (tbl, payload, _ offlen); if (! Dest |! (Dest-> flags & IP_VS_DEST_F_AVAILABLE) | atomic_read (& dest-> weight) <= 0 | is_overloaded (dest) {IP_VS_ERR_RL ("OFFH: no destination available \ n "); return NULL;} return dest;} static struct ip_vs_scheduler ip_vs_offh_scheduler = {. name = "offh ",. refcnt = ATOMIC_INIT (0 ),. module = THIS_MODULE ,. n_list = LIST_HEAD_INIT (ip_vs_offh_scheduler.n_list ),. init_service = ip_vs_offh_init_svc ,. done_servic E = ip_vs_offh_done_svc ,. update_service = ip_vs_offh_update_svc ,. schedule = ip_vs_offh_schedule,}; static ssize_t s_s_sch_offset_read (struct file * file, char _ user * buf, size_t count, loff_t * ppos) {int ret = 0; ret = sprintf (buf, "offset: % u; offlen: % u \ n", offset, offlen); return ret ;} /** Set offset/offset length * echo offset: $ value1 offlen: $ value2>/proc/net/ipvs_sch_offset */static int ipvs_ SC H_offset_write (struct file * file, const char _ user * buf, size_t count, loff_t * ppos) {int ret = count; char * p = buf, * pstart; if (p = strstr (p, "offset:") = NULL) {ret =-EINVAL; goto out;} p + = strlen ("offset :"); pstart = p; if (p = strstr (p, "") = NULL) {ret =-EINVAL; goto out;} p [0] = 0; offset = skip_atoi (& pstart); if (offset = 0 & strcmp (pstart, "0") {ret =-EINVAL; goto out ;} P + = strlen (";"); if (p = strstr (p, "offlen:") = NULL) {ret =-EINVAL; goto out;} p + = strlen ("offlen:"); pstart = p; offlen = skip_atoi (& pstart); if (offlen = 0 & strcmp (pstart, "0") {ret =-EINVAL; goto out;} out: return ret;}/** because you do not want to modify the user-state configuration interface, I still think procfs is more dependent on the common **/static const struct file_operations parameter s_sch_offset_file_ops = {. owner = THIS_MODULE ,. read = s_s_sch_offset_read,. Write = s_s_sch_offset_write,}; struct net * net = & init_net; static int _ init ip_vs_offh_init (void) {int ret =-1; if (! Proc_create ("s_s_sch_offset", 0644, net-> proc_net, & ipvs_sch_offset_file_ops) {printk ("OFFH: create proc entry failed \ n"); goto out ;} return partition (& ip_vs_offh_schedup); out: return ret;} static void _ exit ip_vs_offh_cleanup (void) {remove_proc_entry ("ipvs_sch_offset", net-> proc_net ); unregister_ip_vs_schedense (& ip_vs_offh_schedense);} module_init (ip_vs_offh_init); module_exit (ip_vs_offh_cleanup); MODULE_LICENSE ("GPL ");

4. Where is the problem? I think the IPVS mechanism should be changed while I think nf_conntrack should be changed.
In the quintuple of Client A, tuple1 matched A real server1 with sessionID1 and set conn cache conn1. After some time, client A changed the IP address, it won't match conn1 any more, and the cache won't hit. Relying on the unchanged sessionID1, it selects the same real server1 in conn_schedule and sets a new conn2 cache item, then conn1 becomes a zombie, waiting for deletion timeout. After a long time, client 2 has taken over the old IP address and UDP port of client 1 and accessed the same UDP Service. At this time, the quintuple of client 2 is tuple1 and carries sessionID2, the real server calculated by sessionID2 should be real server2, but it will be loaded to real server1 because it hits the botnet conn1. In this case, client 2 changes the IP address, its quintuple is changed to tuple2'. After conn_schedule calculation, it matches real server2. Because the initial real server serving it is real server1, connection switching will occur. This is a problem caused by the absence of the delete notification mechanism.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.