Persistent parameter for source code analysis of ipvs (also called LVS)

Source: Internet
Author: User

I recently used LVS for lb. I found that a client always suffers session loss. I used common configurations and used wlc in the balance policy, after reading this, clients with the same wlc policy may be trained on different backend machines and the session is not copied on the backend server, this will cause the client to access different servers and cause session loss.

In this tutorial, you can adjust the balance policy to ensure that the same client is mapped to the same server. For more information about the balance policy, see (click to open the link ), in the policy, only source hashing scheduling seems to be able to achieve this goal, but this policy is not recommended.

When viewing the ipvsadmin parameters, we found the parameter-P,

-P, -- persistent [timeout]: sets persistent connections. This mode can send multiple requests from customers to the same real server.

It seems that the-p parameter is a bit similar to the source hashing scheduling policy. You should check the code to solve the problem.

Ipvs is also called LVS

LVS belongs to the kernel module, and the code can be directly found in the kernel code. The name in the kernel is ipvs, which is also called ipvs.

The ipvs code is mounted in/NET/Netfilter/ipvs. Here we can also see that ipvs is a kernel module based on the netfilter framework, in Linux, the netfilter architecture is to place some Detection Points (hooks) in several locations throughout the network flow, and some processing functions are registered on each detection point for processing (such as packet filtering, nat, or even user-defined functions ).

Netfilter implementation

Netfilter status chart:


Ipvs registers the hook function in several states in netfilter.

static struct nf_hook_ops ip_vs_ops[] __read_mostly = {/* After packet filtering, forward packet through VS/DR, VS/TUN, * or VS/NAT(change destination), so that filtering rules can be * applied to IPVS. */{.hook= ip_vs_in,.owner= THIS_MODULE,.pf= PF_INET,.hooknum        = NF_INET_LOCAL_IN,.priority       = 100,},/* After packet filtering, change source only for VS/NAT */{.hook= ip_vs_out,.owner= THIS_MODULE,.pf= PF_INET,.hooknum        = NF_INET_FORWARD,.priority       = 100,},/* After packet filtering (but before ip_vs_out_icmp), catch icmp * destined for 0.0.0.0/0, which is for incoming IPVS connections */{.hook= ip_vs_forward_icmp,.owner= THIS_MODULE,.pf= PF_INET,.hooknum        = NF_INET_FORWARD,.priority       = 99,},/* Before the netfilter connection tracking, exit from POST_ROUTING */{.hook= ip_vs_post_routing,.owner= THIS_MODULE,.pf= PF_INET,.hooknum        = NF_INET_POST_ROUTING,.priority       = NF_IP_PRI_NAT_SRC-1,},

The balancing policy mainly implements the hook function ip_vs_in corresponding to the status nf_inet_local_in.

Two structs in ipvs

1. the IP address of the client is recorded in ip_vs_conn. The virtual address established by ipvs corresponds to the address of the Real Server.

2. ip_vs_protocol records different hook functions in different protocols (TCP, UDP), such as what type of scheduling is used and what functions are used to receive data.

struct ip_vs_protocol ip_vs_protocol_tcp = {.name ="TCP",.protocol =IPPROTO_TCP,.num_states =IP_VS_TCP_S_LAST,.dont_defrag =0,.appcnt =ATOMIC_INIT(0),.init =ip_vs_tcp_init,.exit =ip_vs_tcp_exit,.register_app =tcp_register_app,.unregister_app =tcp_unregister_app,.conn_schedule =tcp_conn_schedule,.conn_in_get =tcp_conn_in_get,.conn_out_get =tcp_conn_out_get,.snat_handler =tcp_snat_handler,.dnat_handler =tcp_dnat_handler,.csum_check =tcp_csum_check,.state_name =tcp_state_name,.state_transition =tcp_state_transition,.app_conn_bind =tcp_app_conn_bind,.debug_packet =ip_vs_tcpudp_debug_packet,.timeout_change =tcp_timeout_change,.set_state_timeout =tcp_set_state_timeout,};

This is a struct of ip_vs_protocol under TCP. The conn_in_get function is also called in the hook function ip_vs_in.
static unsigned intip_vs_in(unsigned int hooknum, struct sk_buff *skb, const struct net_device *in, const struct net_device *out, int (*okfn)(struct sk_buff *)){...pp = ip_vs_proto_get(iph.protocol);if (unlikely(!pp))return NF_ACCEPT;/* * Check if the packet belongs to an existing connection entry */cp = pp->conn_in_get(af, skb, pp, &iph, iph.len, 0);...}
The common TCP is used as an example to call the tcp_conn_in_get function.
Global Array ip_vs_conn ip_vs_conn_tab and chain table c_list

This is a global ip_vs_conn array, saving all of which are connected. The Client IP address and port hash are used to calculate the saved ip_vs_conn array.

Ip_vs_conn itself also saves a chain table c_list, which is a chain table structure and saves ip_vs_conn

Initialization

During the ipvs initialization, the array size is initialized, And the size is unchangeable (1 <12) 4096

When the client hash is the same, traverse the c_list linked list in ip_vs_conn to find the matched client (the address and port are the same)

If the corresponding ip_vs_conn cannot be found

if (!pp->conn_schedule(af, skb, pp, &v, &cp))return v;
The hook function for conn_schedule in TCP is tcp_conn_schedule, which is the core function of the scheduling algorithm we mentioned earlier.

Static round (int af, struct sk_buff * SKB, struct ip_vs_protocol * PP, int * verdict, struct ip_vs_conn ** CPP) {struct ip_vs_service * SVC; struct tcphdr _ tcph, * th; struct ip_vs_iphdr IPH; ip_vs_fill_iphdr (AF, skb_network_header (SKB), & iph); Th = skb_header_pointer (SKB, IPH. len, sizeof (_ tcph), & _ tcph); If (Th = NULL) {* verdict = nf_drop; return 0 ;} if (Th-> SYN & (SVC = ip_vs_service_get (AF, S KB-> mark, IPH. protocol, & iph. daddr, th-> DEST) {If (ip_vs_todrop () {/** it seems that we are very loaded. * We have to drop this packet :( */ip_vs_service_put (SVC); * verdict = nf_drop; return 0 ;} /** let the virtual server select a real server for the * incoming connection, and create a connection entry. */* CPP = ip_vs_schedule (SVC, SKB); // ip_vs_scheduleif (! * CPP) {* verdict = ip_vs_leave (SVC, SKB, pp); Return 0;} ip_vs_service_put (SVC);} return 1 ;}

Function ip_vs_schedule

struct ip_vs_conn *ip_vs_schedule(struct ip_vs_service *svc, const struct sk_buff *skb){.../* *    Persistent service */if (svc->flags & IP_VS_SVC_F_PERSISTENT)return ip_vs_sched_persist(svc, skb, pptr);/* *    Non-persistent service */if (!svc->fwmark && pptr[1] != svc->port) {if (!svc->port)pr_err("Schedule: port zero only supported "       "in persistent services, "       "check your ipvs configuration\n");return NULL;}<span style="white-space:pre"></span>....return cp;}
We can see ip_vs_svc_f_persistent, that is, the persistent parameter.

Implement the persistent parameter to create an ip_vs_conn Template

To ensure that the same client IP address is still connected to the original server within a certain period of time, it means that the original Client IP address must be retained to the real server connected last time.

In ipvs, another array is not generated to retain this state. Instead, an ip_vs_conn template is introduced, and the ip_vs_conn template is still saved in the ip_vs_conn_tab in the global table mentioned above.

Since it is stored in the same global table, what is the difference between this template and the common ip_vs_conn? It's easy. Here, you only need to set the client port to 0, and save the IP address connected to the Real Server to ip_vs_conn.

Reference Function for specific implementation: ip_vs_sched_persist

When can I clear the ip_vs_conn template?

-The p parameter has a specified time. The ip_vs_conn struct contains a timer and timeout time. When the function creates a connection, it sets the timer execution function ip_vs_conn_expire.

struct ip_vs_conn *ip_vs_conn_new(int af, int proto, const union nf_inet_addr *caddr, __be16 cport,       const union nf_inet_addr *vaddr, __be16 vport,       const union nf_inet_addr *daddr, __be16 dport, unsigned flags,       struct ip_vs_dest *dest){<span style="white-space:pre"></span>......setup_timer(&cp->timer, ip_vs_conn_expire, (unsigned long)cp);......return cp;}

The ip_vs_conn_expire function removes the ip_vs_conn_tab timeout template from ip_vs_conn.

In the ip_vs_sched_persist function, when a new connection is created, the trigger time of the timer of the ip_vs_conn template is also updated (the current time + the timeout time of the-p parameter ), implemented in the ip_vs_conn_put function.


Complete Flowchart




Debug logs of ipvs

You need to re-compile the kernel and set the parameters in config.

Config_ip_vs_debug = y

After compilation, you must modify the parameters.

/Proc/sys/NET/IPv4/VS/debug_level

Set to 12 and print logs to dmesg.


Entry in ipvs ip_vs_conn_tab

Content can be accessed through/proc/NET/ip_vs_conn_entries




Persistent parameter for source code analysis of ipvs (also called LVS)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.