TCP/IP principles, basics, and implementation on Linux

Source: Internet
Author: User
Tags keep alive

Introduction: as the theoretical basis, this article describes the basic principles of TCP/IP and important protocol details, and introduces the implementation of TCP/IP on Linux.

OSI reference model and TCP/IP Reference Model
The Open System Interconnection Reference model is developed based on the recommendations of the International Organization for Standardization (ISO). It is divided into seven layers, as shown in 3-1. After the emergence of satellites and wireless networks, the existing Protocols may encounter problems in interconnection with these networks. Therefore, a new reference architecture is required to connect multiple networks seamlessly. This architecture is the TCP/IP Reference Model.


TCP protocol
The Internet has two main types of agreement on the transmission layer: connection-oriented protocol and connectionless protocol. Transmission Control Protocol (TCP) is a protocol used to provide reliable and end-to-end byte stream communication over the unreliable Internet. You can obtain the TCP Service by creating a communication port called socket on the sender and receiver respectively. All TCP connections are full-duplex and point-to-point.

The send and receive TCP entities exchange data in the form of datagram. A datagram contains a fixed 20-byte header, an optional part, and zero or multiple bytes of data. There are two restrictions on the size of the datagram: First, each datagram (including the TCP Header) must be suitable for the load capacity of the IP address, and cannot exceed 65535 bytes. Second, each network has the maximum transmission unit (MTU), which must be suitable for each datagram. If a datagram enters a network where the MTU is smaller than the length of the datagram, The vro on the network boundary splits the datagram into several small data packets.

The basic protocol used by TCP entities is the sliding window protocol. When the sender sends a data report, it starts the timer. When the datagram arrives at the destination, the TCP entity of the receiver sends a datagram to the backend, which contains a confirmation sequence number, which is equal to the sequence number of the next datagram to be received. If the sender's timer times out before the confirmation message arrives, the sender resends the datagram.

2.1 TCP Data Header
Figure 2 shows the format of the TCP Data header.


Source Port and destination port: 16-bit long. Identifies the remote and local port numbers.

Sequence Number: 32-bit long. Indicates the order of sent data packets.

Confirmation No.: 32-bit long. The serial number of the next datagram you want to receive.

TCP Header Length: 4-digit length. Indicates the number of 32 characters in the TCP header.

The next six digits are not used.
Ack: ACK position 1 indicates that the confirmation number is valid. If Ack is 0, the datagram does not contain confirmation information, and the validation field is omitted.

Psh: Indicates data with the push flag. Therefore, the receiver can send the request datagram to the application as soon as it arrives, instead of waiting until the buffer is full.

RST: Used to reset connections that encounter errors due to host crashes or other causes. It can also be used to reject illegal data packets or connection requests.

SYN: used to establish a connection.

Fin: Used to release connections.

Window Size: 16-bit long. The window size field indicates the number of bytes that can be sent after the bytes are confirmed.

Checksum: 16-bit long. It is set to ensure high reliability. It verifies the sum of the header, data, and pseudo TCP header.

Optional: 0 or multiple 32-bit characters. The options include maximum TCP load, window ratio, and resend datagram.

  1. Maximum TCP load: each host is allowed to set the maximum TCP load capacity it can accept. During the connection establishment, both parties declare their maximum load capacity, and select smaller ones as the standard. If this option is not used by a host, the default load capacity is 536 bytes.
  2. Window proportion: allows the sender and receiver to agree on an appropriate window proportion factor. This factor allows the sliding window to reach a maximum of 232 bytes.
  3. Resend datagram: This option allows the recipient to send one or more specified datagram requests.

2.2 connection management
The three-way handshake is used to establish a connection in TCP. To establish a connection, one party, such as the service By executing the listen and accept primitives, passively wait for a connection request to arrive.

The other party, such as the customer, executes the connect primitive and specifies the IP address and port number it wants to connect to, sets the maximum value of TCP datagram that it can accept, and some optional user data. The CONNECT primitive sends data with SYN = 1, ACK = 0 to the target end and waits for the response from the target end.

After the datagram arrives at the destination end, the TCP entity will check whether a process is listening to the port specified by the destination port field. If no, it will send a response with RST = 1 and refuse to establish the connection.

If a process is listening on the port, the TCP datagram is sent to the process, which can accept or reject the connection. If yes, a confirmation datagram is returned. In general, TCP connection establishment process 3 is shown.

 


 

 

To release the connection, each party can send a TCP datagram with fin = 1, indicating that this party has no data to send. When the fin datagram is confirmed, the connection in that direction is closed. When the connection is closed in both directions, the connection is completely released. In general, releasing a connection requires four TCP datagram packets: each direction has one fin datagram and one ack datagram.

2.3 Transmission Policy
In TCP, sliding window is used for transmission control. The size of sliding window means that the receiver has a large buffer size that can be used to receive data. The sender can determine the number of bytes of data to be sent by sliding the window. When the sliding window is 0, the sender cannot send another datagram, except in two cases, one is to send emergency data, for example, allows you to terminate processes running on a remote machine. In another case, the sender can send a 1-byte datagram to notify the receiver to re-declare the next byte it wants to receive and the size of the sender's sliding window.

2.4 Congestion Control
When the load capacity loaded to a network exceeds its processing capacity, congestion may occur. There are two potential problems for the Internet-network capacity and receiver capacity, which should be handled separately. The sender always maintains two windows: Windows recognized by the receiver and congestion windows. Take the minimum values of the two windows as the number of bytes that can be sent.

When a connection is established, the sender initializes the congestion window size to the maximum datagram length value used by the connection, and then sends a maximum datagram length. If the datagram is confirmed before the timer times out, the sender adds a byte value of the datagram to the original congestion window to double the maximum datagram size, then, two data packets are sent. When each of these datagram values is confirmed, the congestion window size increases the length of a maximum datagram. When the congestion window is an hour of n data packets, if all N data packets sent are confirmed in time, the congestion window size is increased by n Bytes corresponding to the number of reports. The congestion window maintains an exponential increase until the data transmission times out or reaches the window size set by the receiver. The congestion window is set to the number of bytes that do not cause timeout or reach the receiver's window size.

2.5 timer management
TCP uses multiple timers, such as resend timers, continuous timers, and "Keep Alive" timers. The most important thing is the resend timer. When sending a datagram, start a data retransmission timer. If the datagram is confirmed before the timer times out, the timer is disabled. If the timer times out before the timer times out, the datagram needs to be resent.

The continuous timer is used to prevent deadlocks. When a connection remains idle for a long time, the "Keep Alive" timer times out and causes one party to check whether the other party still exists. If it does not receive a response, terminate the connection.

UDP protocol
The Internet Protocol group also supports user data protocol (UDP ). UDP uses the underlying internet protocol to transmit packets, providing the same unreliable and connectionless datagram transmission service as the IP address. It does not use confirmation information to confirm the arrival of packets, does not sort the received data packets, and does not provide feedback information to control the information traffic transmitted between machines. The reliability of UDP communication, including packet loss, duplication, and disorder, is undertaken by UDP applications.

A udp datagram includes an 8-byte header and data section. The Header Format is shown in Figure 4. It contains four 16-byte fields. The source port and destination port are used to indicate the source and destination port numbers in the same way as those in TCP. The UDP length field specifies the length of the datagram, including the 8-byte header and data. The UDP checksum field is optional. It is used to record the checksum of UDP header, UDP pseudo header, and user data.

 


 

 

IP protocol
The IP protocol provides an unreliable, connectionless datagram transmission mechanism. TCP/IP is designed to adapt to the diversity of physical networks. This adaptability is mainly reflected by the IP layer. Due to the diversity of physical networks, the data frame formats and address formats of various physical networks vary greatly. In order to shield these underlying details, communication between networks using different physical networks can be performed, TCP/IP adopts IP datagram and IP address as the Uniform Description of physical data frames and physical addresses. In this way, the IP layer provides a Unified IP datagram and a Unified IP address, so that the differences between various object frames and physical addresses do not exist in the upper layer protocol.

4.1 IP data Header
An IP datagram consists of a header and a data part. The header contains a fixed length part of 20 bytes and an optional length part. The Header Format is 5.

 


 

 

Version: 4-digit long. The Protocol version number corresponding to the datagram is recorded. The current IP protocol has two versions: IPv4 and IPv6.

IHL: 4-digit length. Represents the total length of the header, in 32-bit bytes.

Service type: 8-bit long. Allows the host to tell the subnet what services it wants. As shown in, the service type domain is divided into five parts. The priority field indicates the priority. The three bits indicate the delay, throughput, and reliability.

 


 

 

Total Length: 16 digits. The total length of the header and data. The maximum length is 65535 bytes.

ID: 16 bits. It allows the target host to determine which group the new segment belongs to, and all the segments belonging to the same group contain the same ID value.

DF: Do not segment. It command the router not to segment the datagram, because the destination cannot restructure the segment.

Mf: indicates further segmentation. It is used to indicate whether all groups have arrived. This field is set for all segments except the last segment.

Segment offset: 13 BITs. Specifies the position of a segment in the current datagram.

Life cycle: 8 bits. A counter used to limit the lifecycle of a group. It decreases in each node, and can be multiplied and decreased when queuing in a vro.

Protocol: 8 bits. It indicates the transmission process to which the group is sent, for example, the ISP, and VDP.

Header checksum: 16 bits. It is only used to verify the header.

Source Address: 32 bits. The source host IP address that generates an IP datagram.

Destination Address: 32-bit. The IP address of the target host.

Optional: variable length. Each option indicates the content in one byte. Some options are also followed by an option length field with one or more data bytes. Five options are defined: security, strict source route selection, Loose Source Route Selection, Record Route, and time mark. But not all routers support all five options.

The security option describes the information security level.

The Strict Source Route Selection option provides a complete path from the source to the destination in a series of IP addresses. The datagram must be transmitted strictly from this path. This field is useful when the routing selection table crashes, the system administrator sends an emergency group, or performs a time measurement.

The Loose Source Route Selection option requires that the router groups all over the listed vrouters, but it can pass through other vrouters.

Record the route option and add the IP addresses of the routers along the route to the optional fields. This allows the system administrator to Track Route Selection Algorithm errors.

Like record routing, the time tag option records a 32-bit time tag in addition to a 32-bit IP address. Similarly, this option can be used to find errors for routing selection algorithms.

4.2 segment and reorganize IP Datagram
IP datagram is transmitted by encapsulating it as a physical frame. Because the Internet is Different from physical network technology, the size of physical frames (maximum transmission unit MTU) may vary in different parts of the Internet. To maximize the capability of the physical network, the IP Module determines the IP datagram Size Based on the MTU of the physical network. When IP datagram is transmitted between two different MTU networks, segmentation and restructuring of IP datagram may occur.

There are three IP header fields that control segmentation and reorganization in the IP header: Id domain, flag domain, and segment offset domain. The identifier is the identifier assigned to the IP datagram by the source host. The destination host determines which datagram segment the received IP datagram belongs Based on the identification domain to reorganize the IP datagram. The DF bit in the field indicates whether the IP datagram can be segmented. When you want to segment an IP datagram, if DF is at location 1, the gateway discards the IP datagram and sends an error message to the source host. The MF bit in the flag field identifies whether the IP datagram segment is the last segment. The segment offset field records the offset of the IP datagram segment in the original IP datagram. The offset is an integer multiple of 8 bytes. The segment offset field is used to determine the sequence of the IP datagram segment during the IP datagram reorganization.

When an IP datagram is transmitted, each segment is transmitted as an independent IP datagram, which may be segmented again or multiple times before arriving at the target host. However, the reorganization of IP datagram segments is only performed on the target host.

4.3 IP address processing of input Datagram
IP addresses can process input data packets in two ways: host-based data packets and gateway-based data packets.

When an IP datagram arrives at the host, if the destination address of the IP datagram matches the host address, the IP receives the datagram and sends it to the Pro protocol software for processing; otherwise, the IP datagram is discarded.

The gateway is different. When the IP datagram arrives at the IP layer of the gateway, the gateway first determines whether the local host is the target host to which the datagram arrives. If yes, the gateway uploads the received IP datagram to the Pro protocol software for processing. If not, the gateway searches for the received IP datagram and forwards it.

4.4 IP address processing of output Datagram
IP addresses can also process the output data packets in two ways. One is the host's processing of the data packets, and the other is the gateway's processing of the data packets.

For the gateway, after the IP receives the IP datagram, the path is searched to find the transmission path of the IP datagram. This path is actually the IP address of the next gateway in the full path. Then, the gateway sends the IP datagram and the address of the next gateway to the network interface software. After the network interface software receives the IP datagram and the next gateway address, it first calls ARP to complete the ing from the next gateway IP address to the physical address, and then encapsulates the IP datagram into a frame, finally, the subnet completes the physical transmission of the datagram.

ICMP protocol
ICMP (Internet Control Message Protocol)-Internet Control Message Protocol. ICMP is mainly used for the construction of error information and control information and the acquisition of some network information. ICMP and IP belong to the same IP layer, but ICMP packets are sent as IP datagram after being encapsulated by IP. ICMP is not regarded as an independent protocol layer because ICMP is not the basis of the Upper-layer protocol and cannot be an independent layer in concept.

ICMP messages include the following types: inaccessibility, timeout, parameter problems, source suppression, redirection, echo request, echo response, time mark request, and time mark response.

The destination unattainable message is used to report that the subnet or vro cannot locate the destination, or groups with DF bits cannot bypass the "small group" network.

Timeout messages are used to report messages that are discarded because the timer is zero.

The parameter error message indicates that an invalid value is found in the header field.

Source-side message suppression is used to suppress hosts that send too many groups. When the host receives this message, the sending speed will be slowed down.

A redirection message is sent when a route error occurs on the vro.

Echo Request and echo response message are used to test whether the target is reachable and runs properly. When receiving the echo request message, the target end should send an echo response message back. The time mark request and time mark response are similar, but the message arrival time and response sending time should be added to the response. The advantage is that it can be used to test network performance.

Implementation of IP in Linux
6. Linux implements the TCP/IP protocol in a hierarchical software structure. BSD set It is supported by the general socket management software Inet socket layer. Inet socket manages IP-based TCP or UDP protocol. When transmitting UDP data reports, Linux does not have to worry about whether the data reports arrive at the destination safely. But for TCP datagram, Linux needs to number the datagram, and the source and destination of the datagram need to coordinate the work so as to ensure that the datagram will not be lost or be sent in wrong order. The Code contained in the IP layer needs to process the datagram header information, and the incoming datagram must be sent to the correct layer of TCP or UDP for processing. The IP layer is the network device layer of Linux, including Ethernet devices or PPP devices. Unlike other Linux devices, network devices do not always represent physical devices. For example, a loopback device is a pure software device. ARP provides the address resolution function, so it is located between the IP layer and the network device layer.

 


Figure 6 Linux Network Hierarchy Structure

 

6.1 Socket buffer
Linux uses the socket buffer to transmit data between the protocol layer and network devices. Sk_buff contains pointer and length information, which allows the protocol layer to process application data using standard functions or methods. 7. Each sk_buff contains a data block, four data pointers, and two length fields. Using four data pointers, each protocol layer can manipulate and manage the data in the socket buffer. The usage of these four pointers is as follows.

Head: the starting address of the data area in the memory. After sk_buff and related data blocks are allocated, the pointer value is fixed.

Data: The current starting address of the protocol data. The value of this pointer varies with the protocol layer with the current sk_buff.

Tail: the end address of the protocol data. Like the Data Pointer, the pointer value also changes with the changes in the protocol layer with the current sk_buff.

End: The end of the data area in the memory. Like the head pointer, after sk_buff is assigned, the value of this pointer remains unchanged.

The two length fields of sk_buff, Len and truesize, respectively describe the length of the current Protocol datagram and the actual length of the data buffer.

 


 

 

6.2 receive IP Datagram
When the network is configured to receive data reports from the network, it must convert the received data to the sk_buff data structure, and then add the structure to the backlog queue for queuing. When the backlog queue becomes very large, the received sk_buff data will be discarded. When a new sk_buff is added to the backlog queue, the underlying network program is marked as ready, so that the scheduler can schedule the underlying program for processing.

The scheduler will eventually run the underlying network processing program. At this time, the underlying network processing program will process any datagram waiting for transmission, but before that, the underlying processing program will first process the backlog queue of the sk_buff structure. The underlying handler must determine the protocol layer to which the received datagram is passed.

During network layer initialization in Linux, each protocol must add the packet_type data structure to the ptype_all linked list or ptype_base hash table for registration. The packet_type data structure includes the protocol type, the pointer to the network device, and the pointer to the Protocol's receiving data processing routine. Ptype_base is a hash table. Its hash function uses protocol identifiers as parameters. The kernel usually uses this hash table to determine which Protocol should accept incoming network datagram. By checking the ptype_all linked list and the ptype_base hash table, the underlying handler of the network will copy the new sk_buff. Finally, the sk_buff will be passed to one or more target protocol processing routines.

6.3 send IP Datagram
The network processing code must be sk_buff to contain the data to be transmitted. when data is transferred between the protocol layers, different protocol headers and Protocol tails must be added.

First, the IP Protocol needs to determine the network device to use. The network device selection depends on the Optimal Route of the datagram. For computers that only use the modem and PPP protocols to connect, it is easier to select routes. However, for computers connected to Ethernet, the selection of routes is complicated.

For each IP datagram to be transmitted, the IP address uses the route table to parse the route of the target IP address. For each target IP address that can be found in the route table, the route table returns an rtable Data Structure describing the available routes. This includes the source address to be used, the device data structure address of the network device, and the preset hardware header information. The hardware header information is related to the network device, including the physical address of the source and target, and other media information.

6.4 segmentation and reorganization of data packets
When an IP address data report is sent, the IP address identifies the network device that sends the IP address data from the IP address routing table. The device data structure corresponding to the network device contains a MTU field, this field describes the largest transmission unit. If the MTU of the device is smaller than the size of the IP datagram to be sent, you need to divide the IP datagram into small fragments. Each piece is represented by a sk_buff, where the IP header is marked as a few pieces, and the offset of the segment in the IP datagram. The final datagram is marked as the final IP segment. If sk_buff cannot be assigned to the IP address during segmentation, the transmission fails.

The receiving of IP segments is more complicated than sending of segments, because IP segments may be received in any order, and all segments must be received before restructuring. Each time an IP data report is received, the IP address must check whether it is a multipart datagram. When a segmented message is received for the first time, the IP address establishes a new ipq data structure and links it to the ipqueue linked list formed by the IP segment waiting for reorganization. As other IP segments are received, the IP addresses locate the correct ipq data structure and create a new ipfrag data structure to describe the segment. Each ipq data structure contains its source and target IP addresses, high-level protocol identifiers, and the IP frame identifiers, which uniquely describe a segmented IP receiving frame. When all fragments are received, they are combined into a single sk_buff and passed to the upper-level protocol layer for processing. If the timer expires before all fragments arrive, ipq data structure and ipfrag are discarded, and the message is assumed to have been lost during transmission, the high-level protocol needs to request the source host to resend the lost information.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.