TCP/IP is a shorthand for the "transmission Control protocol/internet Protocol", the Chinese translation is the Transmission Protocol/Internet Protocol. TCP/IP is not a protocol, but a generic term for a protocol cluster. It is the foundation of today's Internet and it is important for programmers to understand it.
Talking about the TCP/IP protocol, we have to mention OSI (open System Interconnect), which is open systems interconnection. Talking about the relationship between TCP/IP and OSI, TCP/IP is generally considered to be a simplified OSI model. In the first instance, TCP/IP development time is earlier than OSI, followed by the OSI because the definition is too detailed to be commercially available, whereas TCP/IP is widely adopted because of its simple definition (only four layers). However, understanding the OSI can help us understand TCP/IP better.
The OSI Reference Model has 7 layers, the TCP/IP model consists of 4 layers, because the OSI reference model of the session layer, presentation layer, application layer and the TCP/IP model of the application layer corresponds to the OSI reference Model of the physical layer, the data link layer and the TCP/IP model of the network interface layer corresponding to the principle of convenient understanding, Often take a compromise approach, using a five-layer protocol principle architecture, as shown in
- Relationship between layers and layers
Layers and layers are called each other, and the same layer communicates with each other. The lower layer, the closer to the hardware, the higher the upper layer, the closer the user.
Each layer is a function of completion. In order to accomplish these functions, we need to abide by common rules. The rules are called Protocols (protocol); Each layer defines a number of protocols, such as TCP: Transmission Control Protocol, UDP: User Message Protocol, Telnet: Telnet protocol, SMTP: Simple Mail Transfer Protocol, FTP: File Transfer Protocol, HTTP: Hypertext Transfer Protocol, DNS: Domain Name System and so on.
- Three handshake, four waves of TCP/IP communication
First of all, the TCP/IP flag bit:
SYN (synchronous) Sync flag
The sync flag. This flag is valid only for three handshake resume TCP connections. It prompts the service side of the TCP/IP connection to check the serial number, which is the synchronization sequence number. The sequence number is the initial sequence number of the TCP/IP connection end (typically the client). Here, the TCP/IP connection can be exchanged to see the data as a range from 0 to 4,294,967,295 of 32 for the counter. The data exchanged over a TCP/IP connection is numbered each byte of a sequence. The Sequence table number column in the TCP/IP header includes the sequence number of the first byte in the TCP segment.
ACK (acknowledgement) Confirmation flag
In most cases, this is the mark bit. The confirmation number (w+1,figure-1) contained in the confirmation Number field in the TCP/IP message header is the next expected sequence number and indicates that the remote server has successfully accepted all data.
PSH (push) Transfer flag
This flag indicates that the receiving side does not queue the data, but instead transfers the data to the application processing as quickly as possible. This flag is always placed when dealing with interactive mode connections such as Telent or rlogin.
FIN (finish) End sign
Used to end a TCP/IP reply, but the corresponding port is still open and ready to receive subsequent data.
RST (reset) Reset flag
TCP/IP connection for resetting the response
URG (urgent) emergency signs
Emergency sign-in position.
Three-time handshake:
First handshake: The client sends a SYN packet (SYN=X) to the server and enters the Syn_send state, waiting for the server to confirm;
Second handshake: The server receives the SYN packet, must confirm the customer's SYN (ACK=X+1), and also sends the SYN packet (syn=y), namely the SYN+ACL packet, when the server enters the SYN_RECV state;
Third handshake: The client receives the server's Syn+ack packet, sends the acknowledgment packet ack (ACK=Y+1) to the server, the packet is sent, the client and the server enter the established state, and completes three handshake.
The packets delivered during the handshake do not contain data, and after three handshakes, the client and server are officially started transmitting the data. Ideally, once the TCP connection is established, the TCP connection soy milk is preserved until either side of the communication actively shuts down the connection.
Similar to the "three-time handshake" that establishes a connection, disconnecting a TCP connection requires "four waves".
First wave: The active closing party sends a FIN, which is used to close the active side to the passive closed side of the data transfer, that is, active scraping to tell the passive closed side: I have no longer send you the data (of course, the data sent before the FIN packet, if not received the corresponding ACK acknowledgement message, The active shutdown will still re-send the data), but at this point the active shutdown can also receive data.
Second wave: The passive closing party receives the fin packet, sends an ACK to the other, confirms that the serial number is received ordinal +1 (same as SYN, one fin occupies a serial number).
Third wave: The passive shut-off side sends a fin, used to close the passive shut-off side to the active shutdown of the data transfer, that is, to tell the active shut-off party, my data is sent out, will not send you data.
Wave for the fourth time:
The active closing party receives fin, sends an ACK to the passive closing party, confirms that the serial number is received sequence number +1, to this point, completes four waves.
Let's talk about the roles and protocols of each layer bottom.
1. Physical Layer
What do we do when we want to surf the Internet? Be sure to install the broadband first and then pull the cable into your front of the computer or router and then configure it, so you can surf the Internet. This is the "physical layer" that connects multiple computers with optical cables, cables, twisted-pair wires, radio waves, and so on. It mainly specifies the electrical characteristics of the network, responsible for transmitting 0 and 1 of electrical signals.
2. Link Layer
2.1 Definitions
If only 0 and 1 are transmitted, it is meaningless, so it is necessary to specify the way of interpretation: how many electrical signals are counted in a group? What does each signal bit mean? This is the function of the link layer, which, above the physical layer, determines how 0 and 1 are grouped.
2.2 Ethernet Protocol
"Ethernet" (Ethernet) is the most common communication protocol standard used by existing LANs today.
"Ethernet" specifies that a set of electrical signals constitute a packet called "frame". Each frame is divided into two parts: header (head) and data.
"Header" contains some description of the packet, such as sender, recipient, data type, etc.; "Data" is the specific content of the packet.
The length of the "header" is fixed at 18 bytes; "Data" has a minimum length of 46 bytes and a maximum of 1500 bytes. Therefore, the entire "frame" is a minimum of 64 bytes and a maximum of 1518 bytes. If the data is longer, you must split multiple frames to send.
2.3 MAC Address
As mentioned above, "header" contains information about the sender and the recipient. So, how are senders and recipients identified?
Ethernet specifies that all devices connected to the network must have a "Nic" interface. The packet must be routed from one network card to another. The address of the network card is the sending and receiving address of the datagram, which is called the MAC address. Each NIC comes out of the factory with a unique MAC address in the world, with a length of 48 bits, usually in 12 hexadecimal digits. The first 6 hexadecimal digits are the vendor number, and the last 6 are the vendor's NIC serial number.
2.4 Broadcast
With the MAC address, you can determine the path of the NIC and the packet. Then how can a NIC know the MAC address of another NIC? The answer is to use the ARP protocol, which is the "Network layer" protocol, which is explained in detail later. Even with a MAC address, how can the system send packets to the receiver exactly? The answer is that Ethernet uses a very "primitive" way, it is not to send the packet accurately to the receiver, but to all the computers in the network to send, so that each machine to determine whether it is the receiver: if its own MAC address and receiver MAC address is the same, receive this package, do further processing, or discard the package. This type of transmission is called "broadcast" (broadcasting).
With the definition of a packet, the MAC address of the network card, the way the broadcast is sent, the link layer can transfer data between multiple computers.
3. Network layer
3.1 The origin of the network layer
Ethernet protocol that relies on MAC addresses to send data. Theoretically, a MAC address can be used to send data to any computer in the world, and technology can be implemented.
But there is a major drawback to doing so. Ethernet uses broadcast to send packets, all members of a "package", not only inefficient, but also limited to the transmission of the sub-network, that is, if the two computers are not on the same subnet, broadcast is not pass through. But this design is reasonable, otherwise every computer on the Internet will receive all the packages, think all feel horrible.
The internet is a huge network of countless sub-networks together. Therefore, you must find a way to differentiate which MAC addresses belong to the same subnet and which are not. If it is the same subnet, it will be broadcast, otherwise it will be sent in a "routing" manner. Unfortunately, the MAC address itself cannot do this. It is only relevant to the vendor, regardless of the network in which it is located.
This led to the birth of the "network layer". Its role is to introduce a new set of addresses that allow us to distinguish whether different computers belong to the same subnet. This set of addresses is called "Network Address", referred to as "url". Therefore, after the "Network layer" appears, each computer has two kinds of addresses, one is the MAC address, the other is the network address. There is no connection between the two addresses, the MAC address is bound on the network card, the network address is assigned by the administrator, they are only randomly grouped together.
The network address helps us determine the subnet where the computer is located, and the MAC address sends the packet to the destination network card in the ad-net. Therefore, it is logical to judge that the network address must be processed first, and then the MAC address will be processed.
3.2 IP protocol
The protocol that specifies the network address is called the IP protocol. By the address defined by it, it is called an IP address.
At present, the widely adopted IP protocol is the fourth edition, referred to as IPV4. The next version of it is IPV6. IPV6 is mainly to solve the problem of insufficient IP address. Here is the main talk about IPv4.
The network address specified by IPV4 is comprised of 32 binary arrays. We are used to representing IP addresses in four-segment decimal numbers ranging from 0.0.0.0 to 255.255.255.255.
Each computer on the Internet is assigned an IP address. This address is divided into two parts, the first part represents the network, the latter part represents the host. For example, IP192.168.1.100, which is a 32-bit address, assuming its network portion of the first 24 bits (192.168.1), then the host part is 8 bits (the last 1). Computers that are in a sub-network must have the same network portion of their IP addresses, meaning that 192.168.1.100 and 192.168.1.102 are in the same subnet. But it's just a hypothesis that judging the network part from an IP address alone cannot be judged. The network part can be the first 24 bits, 16 bits, or even 2 bits.
How to use IP address to determine whether two computers belong to the consent sub-network? This requires another parameter, "Subnet mask" (subnet mask).
The subnet mask is a parameter that represents the characteristics of a child network. It is formally equal to the IP address, is also a 32 binary number, its network portion is all 1, the host part is all 0. For example, 192.168.1.100 this IP address, if the network portion is the first 24 for, the host part for the latter 8, then the subnet mask is 11111111.11111111.11111111.00000000, written in decimal is 255.255.255.0
Knowing the "subnet mask", we can determine whether any two IP addresses are in the same sub-network. The method is to use two IP addresses with the subnet mask for the and operation (two digits are 1 for the result of 1, otherwise 0), and then compare the results to be the same, if the same indicates that they are in the same subnet, otherwise it is not. Like what The subnet masks that have been 192.168.1.100 and 192.168.1.101 are 255.255.255.0, and to know if they are in the same subnet, they need to perform an and operation with the subnet mask, and the result is 192.168.1.0, so they are in the same subnet.
There are two main functions of IP protocol, one is to assign an IP address to each computer, and the other is to determine which addresses are in the same sub-network.
3.3 IP Packets
The data that is sent according to the IP protocol is called an IP packet. It contains information about the IP address.
As mentioned earlier, the Ethernet packet contains only the MAC address, and there is no field for the IP address. Do you need to modify the data definition and add a field?
The answer is no, we can put the IP packet directly into the "data" part of the Ethernet packet, so there is no need to modify the Ethernet network specifications. This is the benefit of the hierarchical structure of the Internet: changes in the upper layers do not involve the underlying structure at all
The structure of the IP packet is also divided into "header" and "data" two parts.
The "header" section mainly includes the version, length, IP address and other information, the "Data" section is the specific content of IP packets. When it is placed in an Ethernet packet, the Ethernet packet becomes the following.
The "header" portion of an IP packet is 20 to 60 bytes long, and the total length of the packet is up to 655,535 bytes. Therefore, in theory, the "data" portion of an IP packet is up to 65,515 bytes in length. As mentioned earlier, the Ethernet packet "data" section is only 1500 bytes long. Therefore, if the IP packet exceeds 1500 bytes, it needs to be cut into several Ethernet packets, and the sub-development is sent.
3.4 ARP Protocol
The ARP protocol mentioned earlier, here is a detailed talk. Because the IP packet is sent in the Ethernet packet, we must be up to two addresses, each other's MAC address and IP address. Normally, the IP address of the other party is always, but we do not know its MAC address. So we need a mechanism to get the MAC address from the IP address.
There are two different situations here. In the first case, the two hosts are no longer the same subnet, then in fact there is no way to get the other's MAC address, only the data packet to two sub-network connection of the network management (gateway), let the gateway to deal with.
In the second case, the two hosts are on the same subnet, so we can get the MAC address of each other with the ARP protocol. The ARP protocol also emits a packet (also included in the Ethernet packet). which contains it to query the host's IP address, in the other's MAC address this column, filled with FF:FF:FF:FF:FF:FF, which means that this is a "broadcast" address. Each host on its subnet receives the packet, which takes the IP address and compares it to its own IP address. If the two are the same, make a reply, report their MAC address to each other, or discard the package. This allows the packet to be sent to any host on the subnet.
4. Transport Layer
4.1 origin of the transport layer
With the MAC address and IP address, we can already establish communication on any two hosts on the Internet. The next problem is that there are many programs on the same host that need to use the network, for example, while you're browsing the web and chatting with your friends. When a packet is sent from the Internet, how do you know if it represents the content of the Web page or the content of the chat? In other words, we also need a parameter that indicates which program (process) The packet is intended to use. This parameter is called "Port", which is actually the number of each program that uses the NIC. Each packet is sent to a specific port on the host, so all the programs are able to fetch the data that they need.
"Port" is a certificate between 0~65535, which is exactly 16 bits. The 0~1023 port is occupied by the system, and the user can only select ports greater than 1023. Whether you are browsing the Web or chatting, the application determines a port and then contacts the appropriate port on the server.
The function of the "Transport layer" is to establish "port-to-port" communication. In contrast, the function of the "network layer" is to establish "host-to-host" communication. As long as the host and Port are determined, we can communicate between the programs. Therefore, the UNIX system puts the host + port, called the socket.
4.2 UDP protocol
We want to include the port information in the packet, which requires a new protocol. The simplest implementation is called the UDP protocol.
UDP packets are also made up of "header" and "data".
The "header" section mainly defines the issuing port and the accept port, and the "Data" section is the specific content. Similarly, the UDP packet is placed in the "data" of the IP packet, as mentioned earlier, the IP packet is placed in the "data" of the Ethernet packet, so the end of the entire Ethernet packet becomes the following:
UDP packets, the header section has a total of 8 bytes, the total length of not more than 65,535 bytes, so exactly can be put into an IP packet.
4.3 TCP protocol
The characteristics of the UDP protocol are relatively simple and easy to implement, but the disadvantage is that the reliability is poor, once the packet is issued, it is impossible to know whether the other party received.
In order to solve this problem and improve the reliability of the network, the TCP protocol was born. You can think of it as a UDP protocol with a confirmation mechanism, each sending out a packet to be confirmed. If a packet is lost, the acknowledgement is not received and the sender knows it is necessary to re-send the packet. The advantage of the TCP protocol is to ensure that the data is not temporary. Its disadvantage is the complexity of the process, the implementation of difficult, more expensive resources.
TCP packets, like UDP packets, are embedded in the "Data" section of the IP packet. TCP packets have no length limit and can theoretically be infinitely long, but in order to ensure the efficiency of the network, the TCP packet length does not exceed the length of the IP packet, to ensure that a single TCP packet does not have to be split again.
5. Application Layer
The application receives data from the "Transport Layer", which is then interpreted. Due to the variety of sources of data, a good format must be specified beforehand, otherwise it cannot be interpreted at all. The role of the "Application layer" is to specify the data format of the application.
For example, the TCP protocol can pass data for a variety of programs, such as email, WWW, FTP, and so on. Then there must be different protocols for the format of e-mail, Web pages, FTP data, these application protocols constitute the "application layer."
This is the highest level, directly facing the user. Its data is placed in the "Data" section of the TCP packet. As a result, Ethernet packets are now turned into this.
At this point, we understand the principle of TCP/IP from a system perspective, the full text ends.
Reference:
Http://www.ruanyifeng.com/blog/2012/05/internet_protocol_suite_part_i.html
Understandable TCP/IP