We use the Internet every day, have you ever wondered how it is implemented?
Billions of computers worldwide, connected together, 22 communication. Beijing, a piece of network card sent out the signal, Shenzhen, another card incredibly received, the two actually do not know each other's physical location, you do not think this is a very magical thing?
in order to make the various computers can be interconnected, ARPANET specified a set of computer communication protocols, the TCP/IP protocol (family), which makes detailed provisions on how computers are connected and networked. Understanding these protocols, you understand the principles of the network.
Because these agreements are too complex and too large, this is just a neat framework to help you grasp them in general.
I. Overview
1.1 Models
in order to reduce the complexity of protocol design, most network models are organized in a hierarchical way. each layer has its own function, just like a building, each layer is supported by the next layer. Each tier uses the services provided by the next tier to serve the previous tier, and the implementation details of this layer of service are masked to the upper layer.
The user touches, just the top layer, does not feel at all below the layer. To understand the Internet, you have to start from the bottom and understand the capabilities of each layer from below.
How to layer different models, some models are seven layers (not commonly used), some four layers (now this is used), such as:
For ease of understanding, we divide it into five layers:
The lower layer, the closer to the hardware, the higher the upper layer, the closer the user. It doesn't really matter what each layer is called (the interviewer may have the name of each layer at the time of the interview). Just know that the Internet is divided into several tiers.
Layer 1.2 and Protocol
Each layer is designed to perform a function. In order to achieve these functions, we need to abide by common rules. Everyone follows this rule, which is called "Agreement" (protocol).
Each layer of the network defines a number of protocols. These protocols are collectively called the "TCP/IP protocol". It is the most basic protocol of Internet, the foundation of Internet, which consists of the IP protocol of the network layer and the TCP protocol of the Transport layer. It is important to note that theTCP/IP protocol is a large family, not only TCP and IP protocols, but also other protocols such as:
second, the physical layer
We start at the bottom of the floor.
Computer to network, the first thing to do? Of course, the first to connect the computer, you can use optical cable, cable, twisted-pair, radio waves and other ways.
This is called the "Physical Layer", which is the physical means of connecting the computer. It mainly specifies the electrical characteristics of the network, the role is responsible for the transmission of 0 and 1 of electrical signals.
Third, the link layer
3.1 Definitions
Pure 0 and 1 have no meaning and must be interpreted: how many electrical signals are counted in a group? What does each signal bit mean? This is the function of the link layer, which, above the physical layer, determines how 0 and 1 are grouped.
3.2 Ethernet Protocol
In the early days, each company had its own way of grouping electrical signals. Gradually, a protocol called "Ethernet" (Ethernet) dominates the situation.
Ethernet provides that a set of electrical signals constitutes a packet called "frame". each frame is divided into two parts: header (head) and data.
"Header" contains some description items of the packet, such as sender, recipient, data type, etc.; "Data" is the specific content of the packet.
The length of the "header", fixed to 18 bytes. The length of the "data" is as short as 46 bytes and up to 1500 bytes. Therefore, the entire "frame" is a minimum of 64 bytes and a maximum of 1518 bytes. If the data is long, it must be split into multiple frames for sending.
3.3 MAC Address
As mentioned above, the "header" of the Ethernet packet contains the information of the sender and the recipient. So, how are senders and recipients identified?
Ethernet specifies that all devices connected to the network must have a "Nic" interface. The packet must be routed from one network card to another. through the network card can make different computers connected, so as to complete the data communication and other functions. the address of the network card is the sending and receiving address of the packet, which is called the MAC address.
A MAC address that identifies a network device, similar to a social security number. Each NIC comes out of the factory with a unique MAC address worldwide (theoretically unique), with a length of 48 bits, usually in 12 hexadecimal digits.
With the MAC address, you can locate the network card and the path to the packet.
3.4 Broadcast
Defining an address is just the first step, and there are more steps behind it.
First, how can a NIC know the MAC address of another NIC?
The answer is that there is an ARP protocol that solves this problem. This is left to the back, only to know that the Ethernet packet must know the receiver's MAC address before it can be sent.
Second, even with a MAC address, how can the system send packets to the receiver exactly?
The answer is that Ethernet uses a very "primitive" way, it is not to send packets to the receiver accurately, but to all the computers within the network to send, so that each computer to determine whether it is the receiver.
, computer 1th sends a packet to computer number 2nd, and Computers 3rd, 4th, and 5th of the same subnet receive the packet. They read the "header" of the packet, find the receiver's MAC address, and then compare it to their MAC address, and if the two are the same, accept the package, do further processing, or discard the package. This type of transmission is called "broadcast" (broadcasting).
With the definition of the packet, the MAC address of the NIC, the way the broadcast is sent, the link layer can transfer data between multiple computers.
Four, the network layer
4.1 The origin of the network layer
Ethernet protocol that relies on MAC addresses to send data. Theoretically, relying solely on the MAC address, Beijing's network card can find the network card in Shenzhen, technically can be achieved.
However, there is a major drawback to doing so. Ethernet uses broadcast to send packets, all members of a "package", not only inefficient, but also confined to the sub-network of the sender. In other words, if two computers are not on the same subnet, the broadcast is not passed. This design is reasonable, otherwise every computer on the Internet will receive all the packets, which will cause disaster (broadcast storm).
The internet is a huge network of countless sub-networks, much like the idea that computers in Beijing and Shenzhen will be in the same sub-network, which is almost impossible.
Therefore, you must find a way to differentiate which MAC addresses belong to the same subnet and which are not. If it is the same subnet, it is sent by broadcast, otherwise it is sent by "route" mode. ("Routing" is equivalent to the sign of the phenomenon of life, the direction of these packets, that is, how to distribute packets to different sub-networks, this is a very large topic, this article is not involved.) Unfortunately, the MAC address itself cannot do this. It is only relevant to the vendor, regardless of the network in which it is located.
This led to the birth of the "network layer". Its role is to introduce a new set of addresses that allow us to distinguish whether different computers belong to the same subnet. This set of addresses is called "Network Address", referred to as "url".
Therefore, after the "Network layer" appears, each computer has two kinds of addresses, one is the MAC address, the other is the network address. There is no connection between the two addresses, the MAC address is bound on the network card, the network address is assigned by the administrator, they are only randomly grouped together.
The network address helps us determine the subnet where the computer resides, and the MAC address sends the packet to the destination network card in that subnet. Therefore, it is logically inferred that the network address must be processed before the MAC address is processed.
4.2 IP Protocol
The protocol that specifies the network address is called the IP protocol. The address that it defines is called an IP address.
At present, the widely used is the fourth edition of IP protocol, referred to as IPV4. This version stipulates that the network address consists of 32 bits.
In practice, we use a decimal number divided into four segments to represent the IP address, from 0.0.0.0 to 255.255.255.255.
Each computer on the Internet will be assigned an IP address. This address is divided into two parts, the previous part represents the network, and the latter part represents the host.
For example, the IP address 172.16.254.1, which is a 32-bit address, assuming that its network portion is the first 24 bits (172.16.254), then the host part is the last 8 bits (the final 1). Computers in the same sub-network, their IP address must be the same network part, that is, 172.16.254.2 should be in the same subnet as 172.16.254.1.
However, the problem is that we cannot judge the network part simply from the IP address. Or take 172.16.254.1 as an example, its network part, in the end is the first 24 bits, or the first 16, or even the top 28, from the IP address is not visible.
So, how can you tell whether two computers belong to the same subnet from an IP address? This will use another parameter, "Subnet mask" (subnet mask).
The so-called "subnet mask" is a parameter that represents the characteristics of a sub-network. It is formally equivalent to an IP address, and is also a 32-bit binary number, its network portion is all 1, the host part is all 0, and 1 and 0 are continuous respectively.
For example, IP address 172.16.254.1, if the network portion is known as the first 24 bits, the host part is the last 8 bits, then the subnet mask is 11111111.11111111.11111111.00000000, written in decimal is 255.255.255.0.
We can use the subnet mask to distinguish which part is the subnet ID and which part is the host ID. the IP address and subnet mask in 1 and "&" to get the subnet ID,IP address and subnet mask in 0 -phase or "|", you can get the host ID.
Knowing the "subnet mask", we can determine whether any two IP addresses are in the same sub-network. The method is to use the two IP address and the subnet mask for each and operation (two digits are 1, the result of the operation is 1, otherwise 0), and then compare the results are the same, if so, it indicates that they are in the same sub-network, otherwise it is not.
For example, the subnet masks for known IP addresses 172.16.254.1 and 172.16.254.233 are 255.255.255.0, are they on the same subnet? Both and operations are performed separately with the subnet mask, and the results are 172.16.254.0, so they are on the same subnet.
To summarize, the IP protocol has two main functions, one is to assign an IP address to each computer, and the other is to determine which addresses are in the same subnet.
4.3 IP Packets
The data that is sent according to the IP protocol is called an IP packet. It is not difficult to imagine that it must include IP address information.
But as mentioned earlier, the Ethernet packet contains only the MAC address, and there is no field for the IP address. Do you need to modify the data definition and add a field?
The answer is no, we can put the IP packet directly into the "data" part of the Ethernet packet, so there is no need to modify the Ethernet specifications at all. This is the benefit of the hierarchical structure of the Internet: changes in the upper layers do not involve the underlying structure at all.
Specifically, IP packets are also classified as "header" and "data" two parts.
The "header" section mainly includes the version, length, IP address and other information, the "Data" section is the specific content of IP packets. When it is placed in an Ethernet packet, the Ethernet packet becomes the following.
The "header" portion of an IP packet is 20 to 60 bytes long, and the total length of the packet is up to 65,535 bytes. Therefore, in theory, the "data" portion of an IP packet is up to 65,515 bytes in length. As mentioned earlier, the "data" portion of an Ethernet packet is only 1500 bytes long. Therefore, if the IP packet exceeds 1500 bytes, it needs to be split into several Ethernet packets, which are sent by sub-development.
4.4 ARP Protocol
There is one last point you need to explain about the network layer.
Because the IP packet is sent in the Ethernet packet, we must also know two addresses, one is the other's MAC address, the other is the other's IP address. Normally, the IP address of the other party is known (explained later), but we do not know its MAC address.
So, we need a mechanism to get the MAC address from the IP address.
This can be divided into two different situations. In the first case, if the two hosts are not in the same sub-network, then in fact there is no way to get the other's MAC address, only the packet to the two sub-network connection "gateway", let the gateway to handle.
In the second case, if the two hosts are on the same subnet, then we can use the ARP protocol to get the MAC address of each other. The ARP protocol also emits a packet (contained in an Ethernet packet) that contains the IP address of the host to which it is queried, in the other's MAC address column, filled with FF:FF:FF:FF:FF:FF, indicating that this is a "broadcast" address. Each host of its subnet receives the packet, which takes the IP address and compares it to its own IP address. If the two are the same, make a reply, report their MAC address to each other, or discard the package.
In short, with the ARP protocol, we can get the host MAC address of the same sub-network, can send packets to any host.
v. Transport Layer
5.1 origin of the transport layer
with the MAC address and IP address, we can already establish communication on any two hosts on the Internet.
The next problem is that there are many programs on the same host that need to use the network, for example, while you're browsing the web and chatting with your friends online. When a packet is sent from the Internet, how do you know whether it represents the content of a Web page or the content of an online chat?
In other words, we also need a parameter that indicates which program (process) The packet is intended to use. This parameter is called "Port", which is actually the number of each program that uses the NIC. Each packet is sent to a specific port on the host, so different programs can take the data they need.
The "Port" is an integer between 0 and 65535, exactly 16 bits. 0 to 1023 of the ports are system-occupied, users can only choose a port greater than 1023. Whether you are browsing the Web or chatting online, the application randomly selects a port and then contacts the appropriate port on the server.
The function of the "Transport layer" is to establish "port-to-port" communication. In contrast, the function of the "network layer" is to establish "host-to-host" communication. As long as the host and Port are determined, we can communicate between the programs. Therefore, the Unix system puts the host + port, called the socket. With it, you can develop Web applications.
5.2 UDP Protocol
Now we have to include the port information in the packet, which requires a new protocol. The simplest implementation is called the UDP protocol, and its format is almost in front of the data, plus the port number.
UDP packets are also made up of "header" and "data".
The "header" section mainly defines the issuing port and the receive port, and the "Data" section is the specific content. Then, the entire UDP packet into the "data" part of the IP packet, and the previous said that the IP packet is placed in the Ethernet packet, so the entire Ethernet packet now becomes the following:
UDP packets are very simple, the "header" section is only 8 bytes, the total length of not more than 65,535 bytes, just put in an IP packet.
5.3 TCP Protocol
The advantages of the UDP protocol are relatively simple and easy to implement, but the disadvantage is that the reliability is poor, once the packet is sent, it is impossible to know whether the other party received.
In order to solve this problem and improve the network reliability, the TCP protocol was born. This protocol is very complex, but it can be approximated that it is a UDP protocol with a confirmation mechanism, each sending a packet requires confirmation. If a packet is lost, the acknowledgement is not received and the sender knows it is necessary to re-send the packet.
Therefore, the TCP protocol ensures that data is not lost. Its disadvantage is the complexity of the process, the implementation of difficult, more expensive resources.
TCP packets, like UDP packets, are embedded in the "Data" section of the IP packet. TCP packets have no length limit and can theoretically be infinitely long, but in order to ensure the efficiency of the network, the TCP packet length does not exceed the length of the IP packet, to ensure that a single TCP packet does not have to be split again.
vi. Application LayerThe application receives data from the "Transport Layer", which is then interpreted. Since the Internet is an open architecture, data sources are varied and must be well-defined in advance, otherwise they cannot be interpreted at all.
The role of the "Application layer" is to specify the data format of the application.
For example, the TCP protocol can pass data to a variety of programs, such as Email, WWW, FTP, and so on. Then there must be different protocols for the format of e-mail, Web pages, FTP data, and these application protocols constitute the "Application layer".
This is the highest level, directly facing the user. Its data is placed in the "Data" section of the TCP packet. As a result, the current Ethernet packet becomes the following.
Linux Network Programming--Introduction to network protocol