"TCP/IP Detailed: Volume One"-tcp part of the explanation

Source: Internet
Author: User
Tags ack

TCP/IP protocol

Danbo 2015-7-2
This article for reference TCP/IP detailed volume one, some knowledge points added to the author's own understanding, if there are errors, please correct me, you can contact me Weibo!

The TCP packet format and IP packet format are as follows:

normal setup and shutdown of TCP

Establish a connection

The TCP protocol provides a reliable connection-oriented service with three-time handshake connections.
First handshake: When a connection is established, the client sends a SYN packet (SYN=J) to the server and enters the Syn_send state, waiting for the server to confirm;
Second handshake: The server receives the SYN packet, returns an ACK (ACK=J+1) to the client, and sends itself a SYN packet (syn=k), the Syn+ack packet, at which time the server enters the SYN_RCVD state;
Third handshake: The client receives the server's Syn+ack packet, sends the acknowledgment packet ack (ACK=K+1) to the server, the packet is sent, the client and the server enter the established state, and the handshake is completed three times.
Complete three handshake, the client and the server start to transfer data, that is, the established state.

Terminating a connection

Use four times to wave and disconnect the two-way connection.
(1) The TCP client sends a fin to shut down the client-to-server data transfer.
(2) The server receives this fin, it sends back an ACK, confirms that the serial number is the received sequence number plus 1. As with Syn, a fin will occupy a sequence number.
(3) The server shuts down the client connection and sends a fin to the client.
(4) The client sends back ACK message confirmation and sets the confirmation sequence number to receive the serial number plus 1.

TCP state transition diagram

The state of the client can be represented by a flowchart:

Closed->syn_sent->established->fin_wait_1->fin_wait_2->time_wait->closed

The status of the server can be flowchart:

Closed->listen->syn received->established->close_wait->last->ack->closed

2MSL Wait Status

The TIME_WAIT state is also known as a 2MSL wait state. Each specific TCP implementation must select a message segment Maximum lifetime MSL (Maximum Segment Lifetime). It is the maximum time that any message segment is discarded before it is in the network. Processing principle: When TCP performs an active shutdown and returns the last ACK, the link must stay in the TIME_WAIT state for a period of 2MSL. This allows TCP to send the last ACK here in case the ACK is lost (send fin premises on the other end)

However, any late message segments will be discarded when the connection is 2MSL waiting. Because the connection defined by the socket pair (socket pair) is not reused during this time in 2MSL waiting, it is good for the client program, but for the service program, such as httpd, it always uses the same port 80来 for service, and in 2MSL time, An error (socket is used) will occur when starting httpd. To avoid this error, the server gives the concept of a quiet time ("Quit"), which means that in 2MSL time, although the server can be restarted, the server still waits for the past 2MSL time to make the next connection.

Semi-open state (Half-open)

If in case the connection is closed or abnormally terminated and the other party does not know, we call such a TCP connection semi-open. This state can be discovered through the keepalive option, and 21 segments have disappeared.

When a party in the open state restarts and is reconnected, it loses all information before the reset, so it does not know the connection mentioned in the data message segment. The RST package answer is returned to reestablish the new connection.

Semi-closed state (Half-close)

One-direction link is off. That is, one end of a TCP connection can also receive data from the other end after it has been sent. The program calls shutdown, not close, but most programs call close to terminate a two-direction connection.

Maximum message segment length MSS

The maximum segment length indicates the length of the maximum block data transmitted to the other end by TCP. When a connection is established, each party is notified by the MSS value (MSS option can only appear in the SYN message). If one party does not receive the MSS value of the other, it is set to the default of 536 bytes.

na-Lattice algorithman application in the network constantly sends out small units of data, and some of them are usually 1 bytes in size. Because the TCP packet has a total header of 40 bytes (plus a 20-byte IP header), which results in a 41-byte packet with only one byte of data, this creates a significant resource waste and, worse, a congestion collision under a slow network (congestion Collapse) Nagle algorithm Process:1. Send-side TCP sends the first data it receives from the sending application, even if there is only one byte;2. After sending out the first segment, the TCP packet on the sending side will accumulate and wait in the output cache, and when the ACK is received from the receiving side or the cache accumulates to a maximum segment, the sending side TCP can send the segment. The advantage of the Nagle algorithm is simplicity, and it takes into account the rate at which the application produces data and the rate at which the data is transported by the network. If the application is faster than the network, the message segment is larger (maximum segment). If the application is slower than the network, the segment is smaller (less than the maximum message segment).  confirmation of the time delayTypically, TCP does not send an ACK immediately when it receives data. Typically, TCP does not send an ACK immediately when it receives data, instead it defers sending so that the ACK is sent along with the data that needs to be sent in that direction (sometimes called this behavior as a data piggyback ack).  Confused window syndromewhen the sending-side application produces slow data, or the receiving application processes the receiving buffer's data very slowly, it transmits very small segments in the link, with a small load of only 1 bytes in extreme cases and 41 bytes for the message segment. This phenomenon is called confused Windows Syndrome (Silly window syndrome).  actions can be taken on either the sender or the receiving party to avoid this phenomenon avoiding measures at the receiving endThe receiver does not advertise a small window, usually the algorithm is that the receiver does not advertise a window larger than the current window unless the window can increase the size of a segment (the size of the MSS will be received), or can be increased by half of the receiver cache space.   avoiding measures on the sending sidethe sender will not send the data until one of the conditions is met: 1. A full-length message segment can be sent; 2. You can send at least half the message segment of the receiver's advertised window; 3. Be able to send all the data at hand and do not want to receive ACK or change the connection to disable the NAG algorithm.  Slow StartIf the sender sends more than one segment of the packet to the network at the outset, it knows that the window size is reached by the receiving party. When the sender and receive azimuth on the same LAN OK. However, problems can occur if there are multiple routers and slow links between the sender and the receiver. Routers in the middle must cache packets and potentially run out of memory space. TCP now supports an algorithm called slow start (slow start). The core of the algorithm is that the rate at which the new packet enters the network is the same as the rate at which the other end returns confirmation.  slow boot adds another window to the sender's TCP: Congestion window (congestion Window,cwnd) when a TCP connection is established with the host of another network, the congestion window is initialized to 1 segments. Each time an ACK is received, the Congestion window adds a message segment. Exponential growth in this way. The sender goes to the congestion window and the minimum value of the notification window as the send upper limit. The Congestion window is the traffic control used by the sender, and the advertisement window is the traffic control used by the receiver.  TCP timeout and retransmissionTCP time-out retransmission uses an exponential backoff algorithm (exponential backoff) for a continuous retransmission between the different time difference, they take the whole after the 1\3\6\12\24\48\64 (maximum value of) Congestion avoidance Algorithmcongestion algorithm is a method to deal with lost packets. Indication of network packet loss: A timeout occurred and a duplicate ACK was received (3 or more than 3)The congestion avoidance algorithm and the slow start algorithm are two different and independent algorithms. But when the party is congested, we give you the afternoon to reduce the packet into the network transmission rate, so you can call slow start to do this. In practice, these two algorithms are usually used together.  The congestion avoidance algorithm and the slow-start algorithm need to maintain two variables per connection: A congestion window (CWnd) and a slow-start threshold (Ssthresh). The working process of the algorithm obtained is as follows:1) for a given connection, initialize CWnd as 1 segment of message, Ssthresh is 65,535 bytes;2) The TCP output data size cannot exceed the size of the CWnd and receiver advertised windows. Congestion avoidance is the traffic control used by the sender, and the notification window is the traffic control of the receiving party. The predecessor is the estimation of the network congestion by the sender, and the latter is related to the available cache size of the receiver on the connection;3) When congestion occurs (timeout or duplicate acknowledgement is received), Ssthresh is set to half of the current window, but at least 2 segment size. In the case of congestion caused by timeouts, CWnd is set to 1 segment (this is slow start);4) When the new data is true, add CWnd, but the increased method depends on whether we are doing slow start or congestion avoidance. If Cwnd≤ssthresh, a slow start is in progress, and the reverse is congestion avoidance. The slow start continues until we return to the point where congestion occurs at half the time when it stops (that is, the new Ssthresh) and then turns to congestion avoidance. The slow-start algorithm initial CWnd is 1 segments, and each ACK is incremented by 1 (note that TCP is a cumulative acknowledgment), so that the window will grow exponentially. The congestion avoidance algorithm requires that CWnd be incremented 1/cwnd each time it is received, which is a linear growth. We want to add up to 1 segments of CWnd in a round-trip time, no matter how many acknowledgments are received in the RTT, and then slow-start to increase CWnd based on the number of confirmations received in this round-trip time.  is a visual description of slow start and congestion avoidance Explanation: In this scenario, it is assumed that congestion occurs when CWnd is a 32 segment. The Ssthresh is then set to 16 message segments, while CWnd is 1 segments. At the moment 0 sends a segment of the message, and assumes that at the moment 1 receives its ACK, at which time the CWnd increases to 2. Then 2 segments are sent, and the CWnd is incremented to 2 (4 increments for each ACK), assuming that they receive their ACK at the moment of 1. This exponential increase algorithm has been carried out until at the moment 3 and time 4 received 8 ack after the CWnd equals Ssthresh to stop, from that moment on, CWnd increases linearly, with a maximum increase of 1 segments per round trip time. as we can see in this diagram, the term "slow start" is not entirely correct. It only uses packet transfer rates that are slower than that caused congestion, but the rate at which packets enter the network during slow boot is still increasing. This rate of increase slows down only when the Ssthresh congestion avoidance algorithm works.  fast retransmission and fast recovery algorithmThe algorithm is typically implemented as follows:1) When a 3rd duplicate ACK is received, Ssthresh is set to half of the current CWnd. Re-transmit the missing message segment. Then set CWnd to the current Ssthresh plus 3 times times the message segment size. the code is implemented as: [Java] View plaincopy step1:if (dupacks >= 3) {Ssthresh = max (2, CWND/2); CWnd = Ssthresh + 3 * SMSS ;}2) Each time you receive another duplicate ACK, cwnd++, and send 1 packets. Attention!! is to first send the packet according to the last CWnd, and then add the CWnd to a message segment size. 3) When an ACK of the new data is reached, the CWnd is set to Ssthresh. This ACK should be a confirmation of all intermediate segments within a round trip time after the retransmission. This step uses congestion avoidance because we halve the current rate when the packet is lost. is an example of congestion avoidance:We note that when CWnd is 512 slow start, because only if the CWnd is greater than Ssthresh for congestion avoidance, when CWnd is 768 at this time or slow start, note that because the CWnd increase is after sending packets, the code implementation is: cwnd++ That is: After the contract is done after the self-added!  TCP Persistence Timerwhen an ACK for a notification window changes, it is possible for the two parties to terminate the connection because it waits for the other: the receiver waits for the data to be received (because it has advertised a non-0 window to the sender), and the sender waits for a window to allow it to continue sending the data. To prevent this deadlock situation, the sender uses a persistent timer (persist timer) to periodically query the receiver to see if the window is larger. These packets sent from the sending message segment are called Windows Sniffing (Window probe). Similarly, using exponential backoff to send persistent timers, TCP never discards the Send window probe. These probes are sent every 60s, and the process continues until either the window is opened or the connection used by the application is terminated.  TCP keepalive Timerkeepalive 

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

"TCP/IP Detailed: Volume One"-tcp part of the explanation

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.