IP Fragmentation and Assembly learning notes

Source: Internet
Author: User
Tags data structures garbage collection goto time interval htons
fragmentation and assembly of IP

IP fragmentation occurs when the length of the IP datagram to be sent exceeds the maximum transmission unit MTU, and fragmentation is allowed. Typically, datagrams sent using the UDP protocol can easily lead to IP fragmentation, whereas TCP protocols are based on streaming and usually do not produce fragmentation.

After the IP datagram is fragmented, each fragment (fragment) is composed of a group with IP header, and chooses the route independently, and after it arrives at the destination host, the IP layer of the destination host will reload all the fragments received before transmitting to the transport layer to an IP datagram. How to understand, IP datagram is the IP layer End-to-end transmission unit (before fragmentation and after reorganization), grouping refers to the IP layer and link layer between the transfer of data units. A packet can be a complete IP datagram or a fragment of an IP datagram. And the fragment is transparent to the transport layer.

The partition of the IP datagram output and the assembly of the input IP datagram fragment involve the following files:

Output of net/ipv4/ip_output.c IP datagram

Assembly of NET/IPV4/IP_FRAGMENT.C IP datagram fragmentation

System parameters

Ipfrag_high_thresh, which can be used to assemble the memory Cap value of an IP datagram, is invoked Ip_evictor () for garbage collection when the IP fragment is assembled, if the cache is occupied more than this parameter.

Ipfrag_low_thresh, which can be used to assemble the lower memory value of an IP datagram, aborts garbage collection once the cache used by the assembled IP fragment drops to this parameter when the Ip_evictor () is called for garbage collection

Ipfrag_time, the time that the IP fragment to be assembled is allowed to be retained in memory, the default value is 30s

Ipfrag_secret_interval, the time interval of IPQ hash list is timed, the default value is 600s
Fragmentation

Ip_finish_output () before the datagram is sent out, the datagram includes a locally sent datagram that includes the forwarded datagram, which detects the length of the SKB buffer, and calls IP_ if the MTU value of the output device is exceeded and some other condition is met. Fragment () the datagram is fragmented; otherwise directly call Ip_finish_output2 () output to the Data link layer

static int ip_finish_output (struct sk_buff *skb)
{
#if defined (config_netfilter) && defined (config_ XFRM)
	/* Policy lookup after SNAT yielded a new Policy/
	if (SKB_DST (SKB)->xfrm!= NULL) {
		IPCB (SKB)->f Lags |= ipskb_rerouted;
		Return Dst_output (SKB);
	}
#endif
	if (Skb->len > Ip_skb_dst_mtu (SKB) &&!skb_is_gso (SKB)) return
		ip_fragment (SKB, Ip_ FINISH_OUTPUT2);
	else return
		Ip_finish_output2 (SKB);
}
At present, there are two kinds of fragmentation processing: fast slicing and slow slicing. The whole slicing process needs to be done, in addition to dividing the three-tier payload into a single fragment based on the MTU, you also need to set the IP header for each fragment, update the checksum, and so on, and in the fast slicing, split the data into fragments that have been completed by the transport layer, and the three layers simply make the fragments IP fragmented. While slow slicing requires all the work to be done by slicing the MTU of a complete IP datagram until it is complete.

* * This IP datagram is too large to be sent in one piece. Break it up into * smaller pieces (each of the size equal to IP header plus * A block of the the "the" original IP data pa
 RT) That'll yet fit into a * single device frame, and queue such a for sending.
	*/int ip_fragment (struct sk_buff *skb, int (*output) (struct Sk_buff *)) {struct IPHDR;
	int raw = 0;
	int ptr;
	struct Net_device *dev;
	struct Sk_buff *skb2;
	unsigned int MTU, Hlen, left, Len, ll_rs, pad;
	int offset;
	__be16 Not_last_frag;
	struct Rtable *rt = skb_rtable (SKB);

	int err = 0;

	dev = rt->u.dst.dev;
	 /* Point into the IP datagram header.

	* * iph = IP_HDR (SKB); if (Unlikely (Iph->frag_off & Htons (IP_DF)) &&!skb->local_df) (ip_inc_stats) {dev_net (dev), ipstats
		_mib_fragfails);
		Icmp_send (SKB, Icmp_dest_unreach, icmp_frag_needed, Htonl (IP_SKB_DST_MTU (SKB)));
		KFREE_SKB (SKB);
	Return-emsgsize;
	 } * * Setup starting values. * * Hlen = IPH->IHL *4;	MTU = DST_MTU (&RT->U.DST)-Hlen;

	/* Size of data space */IPCB (SKB)->flags |= ipskb_frag_complete; /* When frag_list are given, use it. A, check its validity: * Some transformers could create wrong frag_list/break existing * One, it's not Prohibi Ted.
	 In this case fall back to copying. * Later:this Step can is merged to real generation of fragments, * we can switch to copy when the
	 Gment.
		*/if (Skb_has_frags (SKB)) {struct Sk_buff *frag, *frag2;

		int first_len = Skb_pagelen (SKB);
		    if (First_len-hlen > MTU | |
		    ((First_len-hlen) & 7) | | (Iph->frag_off & Htons (ip_mf|
		    Ip_offset)) | |

		Skb_cloned (SKB)) goto Slow_path;
			    Skb_walk_frags (SKB, Frag) {/* correct geometry. */if (Frag->len > MTU | |
			    ((Frag->len & 7) && Frag->next) | |

			Skb_headroom (Frag) < Hlen) Goto Slow_path_clean; /* Partially cloned SKB? */if (skb_shared (Frag)) Goto SLOw_path_clean;
			BUG_ON (FRAG-&GT;SK);
				if (skb->sk) {Frag->sk = skb->sk;
			Frag->destructor = Sock_wfree;
		} skb->truesize-= frag->truesize; }/* Everything is OK. generate!
		* * Err = 0;
		offset = 0;
		Frag = Skb_shinfo (SKB)->frag_list;
		Skb_frag_list_init (SKB);
		Skb->data_len = First_len-skb_headlen (SKB);
		Skb->len = First_len;
		Iph->tot_len = htons (First_len);
		Iph->frag_off = htons (IP_MF);

		Ip_send_check (IPH); for (;;) {/* Prepare header of the next frame, * before previous one went down. */if (frag) {frag->ip_summed =
				Checksum_none;
				Skb_reset_transport_header (Frag);
				__skb_push (Frag, Hlen);
				Skb_reset_network_header (Frag);
				memcpy (Skb_network_header (Frag), IPH, Hlen);
				IPH = IP_HDR (Frag);
				Iph->tot_len = htons (Frag->len);
				Ip_copy_metadata (Frag, SKB);
				if (offset = = 0) ip_options_fragment (frag);
				Offset + = skb->len-hlen; Iph->frag_off = Htons (oFFSET&GT;&GT;3);
				if (Frag->next!= NULL) iph->frag_off |= htons (IP_MF);
			/* Ready, complete checksum * * Ip_send_check (IPH);

			Err = output (SKB);
			if (!err) ip_inc_stats (dev_net (Dev), ipstats_mib_fragcreates);

			if (Err | |!frag) break;
			SKB = Frag;
			Frag = skb->next;
		Skb->next = NULL;
			} if (err = = 0) {ip_inc_stats (dev_net (Dev), ipstats_mib_fragoks);
		return 0;
			while (frag) {SKB = frag->next;
			KFREE_SKB (Frag);
		Frag = SKB;
		} ip_inc_stats (Dev_net (Dev), ipstats_mib_fragfails);

return err;
			Slow_path_clean:skb_walk_frags (SKB, Frag2) {if (Frag2 = = Frag) break;
			Frag2->sk = NULL;
			Frag2->destructor = NULL;
		Skb->truesize + + frag2->truesize;		}} slow_path:left = skb->len-hlen;		/* Spaces per frame */ptr = raw + Hlen;  /* Where to start///* for bridged IP traffic encapsulated inside F.E. A VLAN header, * We need to make room for
The Encapsulating header	 * * pad = Nf_bridge_pad (SKB);
	Ll_rs = Ll_reserved_space_extra (Rt->u.dst.dev, pad);

	MTU-= pad;
	 * * Fragment the datagram.
	* * offset = (Ntohs (iph->frag_off) & Ip_offset) << 3;

	Not_last_frag = Iph->frag_off & Htons (IP_MF);
	 * * Keep copying data until we run out.
		* (Left > 0) {len = left;
		/* If:it doesn ' t fit to use ' MTU '-the data spaces left */IF (len > MTU) len = MTU;
		/* If:we are not sending upto and including the packet end then align "next start on" an eight byte boundary * *
		if (Len < left) {len &= ~7;
		 } * * Allocate buffer. */if ((Skb2 = ALLOC_SKB (len+hlen+ll_rs, gfp_atomic)) = = NULL) {netdebug (kern_info "IP:frag:no memory for new FRA
			Gment!\n ");
			err =-enomem;
		Goto fail;
		}/* * Set up data on packet * * * IP_COPY_METADATA (SKB2, SKB);
		Skb_reserve (SKB2, ll_rs);
		Skb_put (SKB2, Len + hlen);
		Skb_reset_network_header (SKB2); Skb2->transport_header = SKb2->network_header + Hlen; * * Charge the memory for the fragment to any owner * it might possess */if (SKB-&GT;SK) Skb_set_owner_w (

		SKB2, Skb->sk);
		 /* Copy the packet header into the new buffer.

		* * Skb_copy_from_linear_data (SKB, Skb_network_header (SKB2), Hlen);
		 /* Copy a block of the IP datagram.
		*/if (skb_copy_bits (SKB, PTR, Skb_transport_header (SKB2), Len) BUG ();

		Left-= Len;
		 * * Fill in the new header fields.
		* * iph = IP_HDR (SKB2);

		Iph->frag_off = htons (offset >> 3)); /* Ank:dirty, but effective trick. Upgrade options only if * The segment to is fragmented was the the (otherwise, * options are already fixed) and MA
		 Ke it ONCE * on the initial SKB and so this all following fragments * would inherit fixed options.

		*/if (offset = = 0) ip_options_fragment (SKB);
	* * Added ac:if We are fragmenting a fragment that's not the * "Last fragment then keep MF on" each bit	 */if (Left > 0 | | not_last_frag) iph->frag_off |= htons (IP_MF);
		ptr = len;

		offset = len;
		 * * Put this fragment into the sending queue.

		* * Iph->tot_len = htons (len + hlen);

		Ip_send_check (IPH);
		Err = output (SKB2);

		if (err) goto fail;
	Ip_inc_stats (dev_net (Dev), ipstats_mib_fragcreates);
	} kfree_skb (SKB);
	Ip_inc_stats (dev_net (Dev), ipstats_mib_fragoks);

return err;
	FAIL:KFREE_SKB (SKB);
	Ip_inc_stats (dev_net (Dev), ipstats_mib_fragfails);
return err;
 }
assembled

On the receiving side, a raw IP datagram issued by the sender, all fragments of which are to be ungrouped before being submitted to the upper layer protocol. Each IP datagram that will be reorganized is represented by a IPQ structure instance, so let's take a look at the very important structure of IPQ.

In order to efficiently assemble the fragments, the data structures used to hold the fragments must be able to do the following:

1. Quickly locate a group of groups belonging to a datagram

2. Inserting new fragments quickly in a component piece belonging to a datagram

3. Effectively determine if all the fragments of a datagram have been fully received

4, with the assembly timeout mechanism, if the reorganization completed before the timer overflow, then delete all the contents of the datagram

struct Inet_frag_queue {
	struct hlist_node	list;
	struct netns_frags	*net;
	struct List_head	lru_list;   /* LRU List member
	*		/spinlock_t lock;
	atomic_t		refcnt;
	struct timer_list	timer;      * When would this queue expire? * *
	struct Sk_buff		*fragments/* List of received fragments
	/ktime_t			stamp;
	int			Len;        /* Total length of orig datagram
	*			/int meat;
	__u8			last_in;    /* First/last segment arrived? * *

#define INET_FRAG_COMPLETE	4
#define INET_FRAG_FIRST_IN	2
#define Inet_frag_last_in	1
};
/* Describe an entry in the "Incomplete datagrams" queue. * *
struct IPQ {
	struct inet_frag_queue q;

	u32		user;
	__be32		saddr;
	__be32		daddr;
	__be16		ID;
	U8		Protocol;
	int             iif;
	unsigned int    rid;
	struct Inet_peer *peer;
SADDR, DADDR, IDS, protocol from IP header, to uniquely determine which IP datagram the fragment comes from

On-feed transport layer and fragment processing call
Ip_local_deliver

Ip_defrag
Ip_find

Inet_frag_find

Ip_frag_queue

Ip_local_deliver_finish


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.