This article was excerpt from: http://www.kuqin.com/shuoit/20141008/342510.html
Objective
We use the Internet every day, have you ever wondered how it is implemented?
Billions of computers worldwide, connected together, 22 communication. Shanghai a piece of network card sends out the signal, Los Angeles of another piece of network card incredibly received, the two actually do not know each other's physical location, you do not think this is a very magical thing?
The core of the Internet is a series of protocols, collectively known as the Internet Protocol (Internet Protocol Suite). They make detailed provisions on how computers are connected and networking. Understanding the protocols, we understand the principles of the Internet.
Here is my study note. Because these agreements are too complex and too large, I want to tidy up a concise framework to help me grasp them in general. To keep it simple and easy to understand, I've done a lot of simplification, some of which are not comprehensive and precise, but should be able to explain the principles of the Internet.
I. Overview 1.1 five-layer model
The realization of the Internet is divided into several layers. Each layer has its own function, just like a building, each layer is supported by the next layer.
The user touches, just the top layer, does not feel at all below the layer. To understand the Internet, you have to start from the bottom and understand the capabilities of each layer from below.
There are different models for layering, some models are divided into seven layers, and some are divided into four layers. I think it is easier to explain the Internet into five layers.
As shown, the bottom layer is called the "solid layer" (physical layer), the topmost layer is called the "Application Layer" (Application layer), the middle of the three layers (bottom-up) is the "link layer", "Network layer" (net Layer), and the transport layers (Transport layer). The lower layer, the closer to the hardware, the higher the upper layer, the closer the user.
It doesn't really matter what their name is. Just know that the Internet is divided into several layers.
Layer 1.2 and Protocol
Each layer is designed to perform a function. In order to achieve these functions, we need to abide by common rules.
The rules that we all obey are called "agreements" (protocol).
Every layer of the Internet defines a lot of protocols. Collectively, these protocols are called "Internet Protocols" (Internet Protocol Suite). They are the core of the Internet, the following describes the functions of each layer, mainly to introduce each layer of the main protocol.
Second, the physical layer
We start at the bottom of the floor.
Computer to network, the first thing to do? Of course, the first to connect the computer, you can use optical cable, cable, twisted-pair, radio waves and other ways.
This is called the "solid Layer", which is the physical means of connecting the computer. It mainly specifies the electrical characteristics of the network, the role is responsible for the transmission of 0 and 1 of electrical signals.
Third, link layer 3.1 definition
Pure 0 and 1 have no meaning and must be interpreted: how many electrical signals are counted in a group? What does each signal bit mean?
This is the function of the link layer, which, above the entity layer, determines how 0 and 1 are grouped.
3.2 Ethernet Protocol
In the early days, each company had its own way of grouping electrical signals. Gradually, a protocol called "Ethernet" (Ethernet) dominates the situation.
Ethernet provides that a set of electrical signals constitutes a packet called "frame". Each frame is divided into two parts: header (head) and data.
"Header" contains some description items of the packet, such as sender, recipient, data type, etc.; "Data" is the specific content of the packet.
The length of the "header", fixed to 18 bytes. The length of the "data" is as short as 46 bytes and up to 1500 bytes. Therefore, the entire "frame" is a minimum of 64 bytes and a maximum of 1518 bytes. If the data is long, it must be split into multiple frames for sending.
3.3 MAC Address
As mentioned above, the "header" of the Ethernet packet contains the information of the sender and the recipient. So, how are senders and recipients identified?
Ethernet specifies that all devices connected to the network must have a "Nic" interface. The packet must be routed from one network card to another. The address of the network card is the sending and receiving address of the packet, which is called the MAC address.
Each NIC comes out of the factory with a unique MAC address in the world, with a length of 48 bits, usually in 12 hexadecimal digits.
The first 6 hexadecimal digits are the vendor number, and the last 6 are the vendor's NIC serial number. With the MAC address, you can locate the network card and the path to the packet.
3.4 Broadcast
Defining an address is just the first step, and there are more steps behind it.
First, how can a NIC know the MAC address of another NIC?
The answer is that there is an ARP protocol that solves this problem. This is left to the back, only to know that the Ethernet packet must know the receiver's MAC address before it can be sent.
Second, even with a MAC address, how can the system send packets to the receiver exactly?
The answer is that Ethernet uses a very "primitive" way, it is not to send packets to the receiver accurately, but to all the computers within the network to send, so that each computer to determine whether it is the receiver.
, computer 1th sends a packet to computer number 2nd, and Computers 3rd, 4th, and 5th of the same subnet receive the packet. They read the "header" of the packet, find the receiver's MAC address, and then compare it to their MAC address, and if the two are the same, accept the package, do further processing, or discard the package. This type of transmission is called "broadcast" (broadcasting).
With the definition of the packet, the MAC address of the NIC, the way the broadcast is sent, the link layer can transfer data between multiple computers.
Iv. The origin of Network Layer 4.1 network layer
Ethernet protocol that relies on MAC addresses to send data. Theoretically, relying solely on the MAC address, Shanghai's network card can find the network card in Los Angeles, technically can be achieved.
However, there is a major drawback to doing so. Ethernet uses broadcast to send packets, all members of a "package", not only inefficient, but also confined to the sub-network of the sender. In other words, if two computers are not on the same subnet, the broadcast is not passed. This design is reasonable, otherwise every computer on the Internet will receive all the packages, which will cause disaster.
The internet is a giant network of countless sub-networks, much like the idea that computers in Shanghai and Los Angeles will be on the same subnet, which is almost impossible.
Therefore, you must find a way to differentiate which MAC addresses belong to the same subnet and which are not. If it is the same subnet, it is sent by broadcast, otherwise it is sent by "route" mode. ("Routing" means the distribution of packets to different sub-networks, which is a large topic that is not covered in this article.) Unfortunately, the MAC address itself cannot do this. It is only relevant to the vendor, regardless of the network in which it is located.
This led to the birth of the "network layer". Its role is to introduce a new set of addresses that allow us to distinguish whether different computers belong to the same subnet. This set of addresses is called "Network Address", referred to as "url".
Therefore, after the "Network layer" appears, each computer has two kinds of addresses, one is the MAC address, the other is the network address. There is no connection between the two addresses, the MAC address is bound on the network card, the network address is assigned by the administrator, they are only randomly grouped together.
The network address helps us determine the subnet where the computer resides, and the MAC address sends the packet to the destination network card in that subnet. Therefore, it is logically inferred that the network address must be processed before the MAC address is processed.
4.2 IP protocol
The protocol that specifies the network address is called the IP protocol. The address that it defines is called an IP address.
At present, the widely used is the fourth edition of IP protocol, referred to as IPV4. This version stipulates that the network address consists of 32 bits.
In practice, we use a decimal number divided into four segments to represent the IP address, from 0.0.0.0 to 255.255.255.255
.
Each computer on the Internet will be assigned an IP address. This address is divided into two parts, the previous part represents the network, and the latter part represents the host. For example, the IP address 172.16.254.1, which is a 32-bit address, assuming that its network portion is the first 24 bits (172.16.254), then the host part is the last 8 bits (the final 1). Computers on the same subnet, their IP addresses must be in the same network part, meaning they 172.16.254.2
should be in the same 172.16.254.1
subnet.
However, the problem is that we cannot judge the network part simply from the IP address. As an 172.16.254.1
example, its network part, whether it is the first 24, or the first 16, or even the first 28 bits, from the IP address is not visible.
So, how can you tell whether two computers belong to the same subnet from an IP address? This will use another parameter, "Subnet mask" (subnet mask).
The so-called "subnet mask" is a parameter that represents the characteristics of a sub-network. It is formally equivalent to an IP address, is also a 32-bit binary number, its network portion is all 1, the host part is all 0. For example, IP address 172.16.254.1, if the network portion is known as the first 24 bits, the host part is the last 8 bits, then the subnet mask is 11111111.11111111.11111111.00000000, written in decimal is 255.255.255.0.
Knowing the "subnet mask", we can determine whether any two IP addresses are in the same sub-network. The method is to use the two IP address and the subnet mask for each and operation (two digits are 1, the result of the operation is 1, otherwise 0), and then compare the results are the same, if so, it indicates that they are in the same sub-network, otherwise it is not.
For example, a known IP address 172.16.254.1
and 172.16.254.233
a subnet mask are all 255.255.255.0
, are they in the same sub-network? Both and operations are performed separately with the subnet mask, and the result is 172.16.254.0
that they are in the same subnet.
To summarize, the IP protocol has two main functions, one is to assign an IP address to each computer, and the other is to determine which addresses are in the same subnet.
4.3 IP packets
The data that is sent according to the IP protocol is called an IP packet. It is not difficult to imagine that it must include IP address information.
But as mentioned earlier, the Ethernet packet contains only the MAC address, and there is no field for the IP address. Do you need to modify the data definition and add a field?
The answer is no, we can put the IP packet directly into the "data" part of the Ethernet packet, so there is no need to modify the Ethernet specifications at all. This is the benefit of the hierarchical structure of the Internet: changes in the upper layers do not involve the underlying structure at all.
Specifically, IP packets are also classified as "header" and "data" two parts.
The "header" section mainly includes the version, length, IP address and other information, the "Data" section is the specific content of IP packets. When it is placed in an Ethernet packet, the Ethernet packet becomes the following.
The "header" portion of an IP packet is 20 to 60 bytes long, and the total length of the packet is up to 65,535 bytes. Therefore, in theory, the "data" portion of an IP packet is up to 65,515 bytes in length. As mentioned earlier, the "data" portion of an Ethernet packet is only 1500 bytes long. Therefore, if the IP packet exceeds 1500 bytes, it needs to be split into several Ethernet packets, which are sent by sub-development.
4.4 ARP Protocol
There is one last point you need to explain about the network layer.
Because the IP packet is sent in the Ethernet packet, we must also know two addresses, one is the other's MAC address, the other is the other's IP address. Normally, the IP address of the other party is known (explained later), but we do not know its MAC address.
So, we need a mechanism to get the MAC address from the IP address.
This can be divided into two different situations. In the first case, if the two hosts are not in the same sub-network, then in fact there is no way to get the other's MAC address, only the packet to the two sub-network connection "gateway", let the gateway to handle.
In the second case, if the two hosts are on the same subnet, then we can use the ARP protocol to get the MAC address of each other. The ARP protocol also emits a packet (contained in an Ethernet packet) that contains the IP address of the host to which it is queried, in the other's MAC address column, which indicates that FF:FF:FF:FF:FF:FF
this is a "broadcast" address. Each host of its subnet receives the packet, which takes the IP address and compares it to its own IP address. If the two are the same, make a reply, report their MAC address to each other, or discard the package.
In short, with the ARP protocol, we can get the host MAC address of the same sub-network, can send packets to any host.
V. The origin of Transport Layer 5.1 transport layer
With the MAC address and IP address, we can already establish communication on any two hosts on the Internet.
The next problem is that there are many programs on the same host that need to use the network, for example, while you're browsing the web and chatting with your friends online. When a packet is sent from the Internet, how do you know whether it represents the content of a Web page or the content of an online chat?
In other words, we also need a parameter that indicates which program (process) The packet is intended to use. This parameter is called "Port", which is actually the number of each program that uses the NIC. Each packet is sent to a specific port on the host, so different programs can take the data they need.
The "Port" is an integer between 0 and 65535, exactly 16 bits. 0 to 1023 of the ports are system-occupied, users can only choose a port greater than 1023. Whether you are browsing the Web or chatting online, the application randomly selects a port and then contacts the appropriate port on the server.
The function of the "Transport layer" is to establish "port-to-port" communication. In contrast, the function of the "network layer" is to establish "host-to-host" communication. As long as the host and Port are determined, we can communicate between the programs. Therefore, the UNIX system puts the host + port, called the socket. With it, you can develop Web applications.
5.2 UDP protocol
Now we have to include the port information in the packet, which requires a new protocol. The simplest implementation is called the UDP protocol, and its format is almost in front of the data, plus the port number.
UDP packets are also made up of "header" and "data".
The "header" section mainly defines the issuing port and the receive port, and the "Data" section is the specific content. Then, the entire UDP packet into the "data" part of the IP packet, and the previous said that the IP packet is placed in the Ethernet packet, so the entire Ethernet packet now becomes the following:
UDP packets are very simple, the "header" section is only 8 bytes, the total length of not more than 65,535 bytes, just put in an IP packet.
5.3 TCP protocol
UDP
The advantages of the protocol are relatively simple and easy to implement, but the disadvantage is that the reliability is poor, once the packet is sent, it is impossible to know whether the other party received.
In order to solve this problem, improve the network reliability, the TCP
agreement was born. This protocol is very complex, but it can be approximated that it is a UDP protocol with a confirmation mechanism, each sending a packet requires confirmation. If a packet is lost, the acknowledgement is not received and the sender knows it is necessary to re-send the packet.
Therefore, the TCP
agreement ensures that the data is not lost. Its disadvantage is the complexity of the process, the implementation of difficult, more expensive resources.
TCP packets, like UDP packets, are embedded in the "Data" section of the IP packet. The packet TCP
has no length limit and can theoretically be infinitely long, but in order to ensure the efficiency of the network, the TCP packet length does not exceed the length of the IP packet to ensure that a single TCP packet does not have to be split.
VI. Application Layer
The application receives data from the "Transport Layer", which is then interpreted. Since the Internet is an open architecture, data sources are varied and must be well-defined in advance, otherwise they cannot be interpreted at all.
The role of the "Application layer" is to specify the data format of the application.
For example, TCP
protocols can pass data to a variety of programs, such as, Email、WWW、FTP
and so on. Then there must be different protocols for the format of e-mail, Web pages, FTP data, and these application protocols constitute the "Application layer".
This is the highest level, directly facing the user. Its data is placed in the TCP
"Data" section of the packet. As a result, the current Ethernet packet becomes the following.
At this point, the entire Internet five-storey structure, from the bottom up all finished. This is from a system perspective, explaining how the Internet is constituted.
Internet Protocol (i)