In-depth understanding of Linux Network Technology Insider--IPV4 message reception (forwarding and local delivery)

Source: Internet
Author: User
Tags htons

We know that after the packet driver processing, call NET_RECEIVE_SKB passed to the specific protocol processing function, for the IPV4 message, its protocol handler function is IP_RCV, IP_RCV after some health checks and other operations, will call IP_RCV_ Finish to process the message. This is also the beginning of the IPV4 protocol for message reception processing.

Let's take a look at the Ip_rcv_finish source code:

Ip_rcv_finish:
The main processing program of IP data message (IP_RCV only to do some health check on IP datagram)//ip_rcv_finish is actually a routing table query, determines whether the packet after the IP layer processing, is to continue to pass, or forward, or discard. 1. Decide whether the message is delivered locally or forwarded, and if it is forwarded, it needs to find the egress device and the next hop node//2. Parsing and processing some options static int ip_rcv_finish (struct Sk_buff *skb) {const struct IPHDR * IPH = IP_HDR (SKB); struct rtable *rt;/* *initialise The virtual path cache for the packet. It describe *how the packet travels inside Linux networking. * No routing Table query has started, so there are no corresponding routing table entries: SKB_DST (SKB) = = NULL. * Locate Ip_route_input () in the routing table, the route table */if (SKB_DST (SKB) = = NULL) {int err = Ip_route_input (SKB, Iph->daddr, Iph->sad) about the kernel Dr, Iph->tos, Skb->dev); Here are some initialization operations, more important, and the next direction of the IP packet about if (unlikely (err)) {if (err = =-ehostunreach) ip_inc_stats_bh (dev_net (Skb->dev) , ipstats_mib_inaddrerrors), else if (err = =-enetunreach) ip_inc_stats_bh (Dev_net (Skb->dev), ipstats_mib_ innoroutes); goto drop;}} #ifdef config_net_cls_route//Update the statistics used by the traffic Cotrol (QoS tier) if (Unlikely (SKB_DST (SKB)->tclassid)) {struct IP_RT_ Acct *st = per_cpu_ptr (Ip_rt_acct, smp_processor_id ()); U32 idx = SKB_DST (SKB)->tclassid;st[idx&0xff].o_packets++;st[idx&0xff].o_bytes + = skb->len;st[(idx>>16) &0xff].i_packets++;st[(idx>>16) &0xff].i_bytes + = Skb->len;}  #endifif (Iph->ihl > 5 && ip_rcv_options (SKB)) goto DROP;RT = skb_rtable (SKB); /* SKB->DST contains routing information. Update SNMP statistics based on route type */if (rt->rt_type = = rtn_multicast) {ip_upd_po_stats_bh (Dev_net (Rt->u.dst.dev), ipstats_mib_ Inmcast,skb->len);} else if (Rt->rt_type = = rtn_broadcast) ip_upd_po_stats_bh (Dev_net (Rt->u.dst.dev), ipstats_mib_inbcast,skb-> len);/* * Dst_input actually calls Skb->dst->input (SKB). The input function is set to the appropriate * function pointer according to the routing information, if it is submitted to the local is ip_local_deliver, if forwarded     For Ip_forward.     * Only consider Ip_local_deliver for the time being. */return dst_input (SKB);d rop:kfree_skb (SKB); return net_rx_drop;}


ip_route_input A routing table query that directly or indirectly determines where the message is to be passed. Whether to send locally or forward.

We can see that if the message is not dropped, the message will eventually be processed by Dst_input (SKB). Dst_input (SKB) is actually performing skb->dst->input (SKB). The input function here is actually determined by Ip_route_input.

For messages that should be delivered locally, theinput pointer points to ip_local_deliver. For the forwarded message, input points to ip_forward.

Local Delivery
/* *  deliver IP Packets to the higher protocol layers. */int ip_local_deliver (struct Sk_buff *skb) {/*       *  R Eassemble IP fragments.     */    if (IP_HDR (SKB)->frag_off & htons (IP_MF | Ip_offset) {        if (Ip_defrag (SKB, ip_defrag_local_deliver))            return 0;    }       Return Nf_hook (Pf_inet, nf_inet_local_in, SKB, Skb->dev, NULL,               ip_local_deliver_finish);}
We know that IPv4 to send the message to the upper layer protocol (local delivery), it needs to reorganize the fragmented messages, Ip_defrag is the completion of the message reorganization.
The call to NetFilter then determines whether to call Ip_local_deliver_finish. Ip_local_deliver_finish
static int ip_local_deliver_finish (struct sk_buff *skb) {struct NET *net = dev_net (Skb->dev);  __skb_pull (SKB, Ip_hdrlen (SKB)); /* Skip IP Header */* point into the IP datagram, just past the header.    */* Set the Transport layer head position */Skb_reset_transport_header (SKB);    Rcu_read_lock ();        {INT protocol = IP_HDR (SKB)->protocol;//Remove the protocol from the IP header.        int hash, raw;    const struct NET_PROTOCOL *ipprot; Resubmit://If raw socket is sent, need to do corresponding processing, clone packet raw = Raw_local_deliver (SKB, protocol);  Get raw socket, if not raw socket, return 0 hash = protocol & (MAX_INET_PROTOS-1); Calculates the position of the Transport Layer protocol processing structure in the Inet_protos array hash table Ipprot = Rcu_dereference (Inet_protos[hash]);                                                                                                                                                Gets the Transport layer protocol processing pointer if (Ipprot! = NULL) {int ret; The main is whether Ipprot has been registered with the current host if (!net_eq (NET, &init_net) &&!IPPROT-&GT;NETNS_OK{//If the processing structure of the corresponding transport layer is acquired if (Net_ratelimit ()) PRINTK ("%s:proto%d ' t netns-ready\n",                __func__, protocol);                KFREE_SKB (SKB);            Goto out;             }//To determine IPSec, and to do related processing. if (!ipprot->no_policy) {if (!xfrm4_policy_check (NULL, xfrm_policy_in, SKB)) {Kfree                    _SKB (SKB);                Goto out;            } nf_reset (SKB);            }//Call handler to proceed to the corresponding 4-layer protocol.            ret = Ipprot->handler (SKB);                if (Ret < 0) {//Processing packet failed, try protocol =-ret again;            Goto resubmit; } ip_inc_stats_bh (NET, ipstats_mib_indelivers);//Add packet processing statistics} else {//if the corresponding transport layer's handler function is not found if (!r AW) {if (Xfrm4_policy_check (NULL, xfrm_policy_in, SKB)) {IP_INC_STATS_BH (NET, ipstats_                    Mib_inunknownprotos); Icmp_send (SKB, ICMP_dest_unreach, Icmp_prot_unreach, 0);            }} else Ip_inc_stats_bh (NET, ipstats_mib_indelivers);        KFREE_SKB (SKB);    }} out:rcu_read_unlock (); return 0;}



Forward

Message forwarding has the following steps to complete:

1. Handling IP Options

2. Make sure the packet can be forwarded

3. decrements the TTL field of the packet header, if the TTL field is 0, discards the packet

4. Depending on the path-dependent MTU, process the segment if necessary

5. Transfer the packet to the out-of-Office device

In the IPV4 protocol, the forwarding of messages starts from Ip_forward:

Ip_forward

int Ip_forward (struct sk_buff *skb) {struct IPHDR *iph;  /* Our header */struct rtable *rt;    /* Route We use */struct ip_options * opt = & (IPCB (SKB)->opt);    if (Skb_warn_if_lro (SKB)) goto drop;    if (!xfrm4_policy_check (NULL, XFRM_POLICY_FWD, SKB)) goto drop;    if (IPCB (SKB)->opt.router_alert && ip_call_ra_chain (SKB))//Handling Router_alert option (important) return net_rx_success; Determine that the two layer is destined for the local, this step check is redundant, because in the two layer received, not the two-layer address is not the local package has been discarded//for the two-layer address of the native data frame, Skb->pkt_type assigned to Packet_host if (skb->    Pkt_type! = packet_host)//goto drop;                                                                                                                                                          Skb_forward_csum (SKB);//This is only a forwarding packet, thus does not relate to L4 layer inspection and skb_forward_csum (SKB);//This is just a forwarding packet, so it does not matter L4 layer inspection and /* * According to the RFC, we must first decrease the TTL field. If * That reaches zero, we must reply a ICMP control message telling * that Packet ' s lifetime expired.    */if (IP_HDR (SKB)->ttl <= 1) goto too_many_hops;    if (!xfrm4_route_forward (SKB)) goto drop;    RT = Skb_rtable (SKB); If the header contains the Strictroute option, and the next hop in the option differs from the gateway of the routing subsystem//means that the option fails, the packet discards if (Opt->is_strictroute && rt->rt_dst! = rt-&    Gt;rt_gateway) goto sr_failed; if (Unlikely (Skb->len > Dst_mtu (&AMP;RT-&GT;U.DST) &&!skb_is_gso (SKB) && (IP_HDR (SKB)-& Gt;frag_off & Htons (IP_DF)) &&!skb->local_df) {ip_inc_stats (Dev_net (Rt->u.dst.dev), Ipstats_mi        B_fragfails);        Icmp_send (SKB, Icmp_dest_unreach, icmp_frag_needed, Htonl (DST_MTU (&AMP;RT-&GT;U.DST)));    Goto drop; }/* We are about to mangle packet. Copy it!    */if (Skb_cow (SKB, Ll_reserved_space (rt->u.dst.dev) +rt->u.dst.header_len) goto drop;    IPH = IP_HDR (SKB);    /* Decrease TTL after SKB cow done */Ip_decrease_ttl (IPH); /* * We now generate an ICMP HOST REDIRECT giving the route * we calculated. */if (rt->rt_flags&rtcf_doredirect &&!opt->srr &&!skb_sec_path (SKB)) ip_rt_send_red    Irect (SKB);    skb->priority = rt_tos2priority (Iph->tos);    Return Nf_hook (Pf_inet, Nf_inet_forward, SKB, Skb->dev, Rt->u.dst.dev, ip_forward_finish); sr_failed:     /* * Strict Routing permits no gatewaying */Icmp_send (SKB, Icmp_dest_unreach, icmp_sr_failed, 0); Goto drop;too_many_hops:/* Tell the sender of its packet died ... */ip_inc_stats_bh (Dev_net (SKB_DST (SKB)->dev), IPS    Tats_mib_inhdrerrors);               Icmp_send (SKB, icmp_time_exceeded, Icmp_exc_ttl, 0);    DROP:KFREE_SKB (SKB);    return net_rx_drop;}
Ip_forward Check the end, if entered ip_forward_finish that the packet can be really passed to another system. Forwarding real work is done in Ip_forward_finish.


Ip_forward_finish

static int ip_forward_finish (struct sk_buff *skb) {    struct ip_options * opt = & (IPCB (SKB)->opt);    IP_INC_STATS_BH (Dev_net (SKB_DST (SKB)->dev), ipstats_mib_outforwdatagrams);     Ip_forward has handled two possible options, Route_alert and Strict Source Routing    ///here also needs to handle other options (Ip_rcv_finish by Ip_options_ Compile initializes the    if (unlikely (Opt->optlen))        ip_forward_options (SKB);           Return Dst_output (SKB);                                                                                                                                 }


Finally call Dst_ouput the packet to the device driver, forwarded out, here in fact the packet and we passed down from the transport layer to pass to the other system packets, the beginning of the path consistent.

/* Output packet to network from transport.  */static inline int dst_output (struct sk_buff *skb) {    return skb_dst (SKB)->output (SKB);}   
Output is a virtual function that, for unicast packets, is initialized to Ip_output and, for multicast packets, is initialized to Ip_mc_output. The two functions also handle fragmentation and call Ip_finish_output at the end.

Ip_output

int ip_output (struct sk_buff *skb) {    struct Net_device *dev = skb_dst (SKB)->dev;    Ip_upd_po_stats (dev_net (Dev), ipstats_mib_out, Skb->len);    Skb->dev = Dev;     Skb->protocol = htons (ETH_P_IP);    Return Nf_hook_cond (Pf_inet, nf_inet_post_routing, SKB, NULL, Dev,                 ip_finish_output,                ! ( IPCB (SKB)->flags & ipskb_rerouted));}
Ip_mc_output

int ip_mc_output (struct sk_buff *skb) {struct sock *sk = skb->sk;    struct Rtable *rt = skb_rtable (SKB);    struct Net_device *dev = rt->u.dst.dev;     /* * If The indicated interface is up and running, send the packet.    */Ip_upd_po_stats (dev_net (Dev), ipstats_mib_out, Skb->len);     Skb->dev = Dev;    Skb->protocol = htons (ETH_P_IP); /* * multicasts is looped back to other local users * * if (rt->rt_flags&rtcf_multicast) {i            F ((!sk | | inet_sk (SK)->mc_loop) #ifdef config_ip_mroute/* Small Optimization:do not loopback not local frames, which returned after forwarding;           They'll is dropped by ip_mr_input in any case.           Note, that local frames is looped back to being delivered to local recipients.         This check is duplicated-ip_mr_input at the moment. */&& ((rt->rt_flags&rtcf_local) | | | IPCB (SKB)->flags&ipskb_forwarded) #endIf) {struct Sk_buff *newskb = Skb_clone (SKB, gfp_atomic);                    if (NEWSKB) Nf_hook (pf_inet, nf_inet_post_routing, NEWSKB, NULL, Newskb->dev,        Ip_dev_loopback_xmit); }/* Multicasts with TTL 0 must not go beyond the host */if (IP_HDR (SKB)->ttl = = 0) {kfree_s            KB (SKB);        return 0;        }} if (rt->rt_flags&rtcf_broadcast) {struct Sk_buff *newskb = Skb_clone (SKB, gfp_atomic); if (NEWSKB) Nf_hook (pf_inet, nf_inet_post_routing, NEWSKB, NULL, Newskb->dev, Ip_dev_loopback    _xmit);                } return Nf_hook_cond (Pf_inet, nf_inet_post_routing, SKB, NULL, Skb->dev, Ip_finish_output, !    (IPCB (SKB)->flags & ipskb_rerouted));}


Ip_finish_output

static int ip_finish_output (struct sk_buff *skb) {#if defined (config_netfilter) && defined (CONFIG_XFRM)/    * Policy Lookup after SNAT yielded a new policy *    /if (SKB_DST (SKB)->xfrm! = NULL) {        IPCB (SKB)->flags |= IPSKB _rerouted;        Return Dst_output (SKB);    } #endif    if (Skb->len > Ip_skb_dst_mtu (SKB) &&!skb_is_gso (SKB))        return ip_fragment (SKB, Ip_ FINISH_OUTPUT2);//segment    Else        return Ip_finish_output2 (SKB);}
Ip_finish_output will connect with the neighbor subsystem. The details look at the IPv4 transmission of the blog post.
























In-depth understanding of Linux Network Technology Insider--IPV4 message reception (forwarding and local delivery)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.