We use the Internet every day, have you ever wondered how it is implemented?
Billions of computers worldwide, connected together, 22 communication. Shanghai a piece of network card sends out the signal, Los Angeles of another piece of network card incredibly received, the two actually do not know each other's physical location, you do not think this is a very magical thing?
The core of the Internet is a series of protocols, collectively known as the Internet Protocol (Internet Protocol Suite). They make detailed provisions on how computers are connected and networking. Understanding the protocols, we understand the principles of the Internet.
Here is my study note. Because these agreements are too complex and too large, I want to tidy up a concise framework to help me grasp them in general. To keep it simple and easy to understand, I've done a lot of simplification, some of which are not comprehensive and precise, but should be able to explain the principles of the Internet.
=================================================
I. Overview
1.1 Five-layer model
The realization of the Internet is divided into several layers. Each layer has its own function, just like a building, each layer is supported by the next layer.
The user touches, just the top layer, does not feel at all below the layer. To understand the Internet, you have to start from the bottom and understand the capabilities of each layer from below.
There are different models for layering, some models are divided into seven layers, and some are divided into four layers. I think it is easier to explain the Internet into five layers.
As shown, the bottom layer is called the "solid layer" (physical layer), the topmost layer is called the "Application Layer" (Application layer), the middle of the three layers (bottom-up) is the "link layer", "Network layer" (net Layer), and the transport layers (Transport layer). The lower layer, the closer to the hardware, the higher the upper layer, the closer the user.
It doesn't really matter what their name is. Just know that the Internet is divided into several layers.
Layer 1.2 and Protocol
Each layer is designed to perform a function. In order to achieve these functions, we need to abide by common rules.
The rules that we all obey are called "agreements" (protocol).
Every layer of the Internet defines a lot of protocols. Collectively, these protocols are called "Internet Protocols" (Internet Protocol Suite). They are the core of the Internet, the following describes the functions of each layer, mainly to introduce each layer of the main protocol.
Second, the physical layer
We start at the bottom of the floor.
Computer to network, the first thing to do? Of course, the first to connect the computer, you can use optical cable, cable, twisted-pair, radio waves and other ways.
This is called the "solid Layer", which is the physical means of connecting the computer. It mainly specifies the electrical characteristics of the network, the role is responsible for the transmission of 0 and 1 of electrical signals.
Third, the link layer
3.1 Definitions
Pure 0 and 1 have no meaning and must be interpreted: how many electrical signals are counted in a group? What does each signal bit mean?
This is the function of the link layer, which, above the entity layer, determines how 0 and 1 are grouped.
3.2 Ethernet Protocol
In the early days, each company had its own way of grouping electrical signals. Gradually, a protocol called "Ethernet" (Ethernet) dominates the situation.
Ethernet provides that a set of electrical signals constitutes a packet called "frame". Each frame is divided into two parts: header (head) and data.
"Header" contains some description items of the packet, such as sender, recipient, data type, etc.; "Data" is the specific content of the packet.
The length of the "header", fixed to 18 bytes. The length of the "data" is as short as 46 bytes and up to 1500 bytes. Therefore, the entire "frame" is a minimum of 64 bytes and a maximum of 1518 bytes. If the data is long, it must be split into multiple frames for sending.
3.3 MAC Address
As mentioned above, the "header" of the Ethernet packet contains the information of the sender and the recipient. So, how are senders and recipients identified?
Ethernet specifies that all devices connected to the network must have a "Nic" interface. The packet must be routed from one network card to another. The address of the network card is the sending and receiving address of the packet, which is called the MAC address.
Each NIC comes out of the factory with a unique MAC address in the world, with a length of 48 bits, usually in 12 hexadecimal digits.
The first 6 hexadecimal digits are the vendor number, and the last 6 are the vendor's NIC serial number. With the MAC address, you can locate the network card and the path to the packet.
3.4 Broadcast
Defining an address is just the first step, and there are more steps behind it.
First, how can a NIC know the MAC address of another NIC?
The answer is that there is an ARP protocol that solves this problem. This is left to the back, only to know that the Ethernet packet must know the receiver's MAC address before it can be sent.
Second, even with a MAC address, how can the system send packets to the receiver exactly?
The answer is that Ethernet uses a very "primitive" way, it is not to send packets to the receiver accurately, but to all the computers within the network to send, so that each computer to determine whether it is the receiver.
, computer 1th sends a packet to computer number 2nd, and Computers 3rd, 4th, and 5th of the same subnet receive the packet. They read the "header" of the packet, find the receiver's MAC address, and then compare it to their MAC address, and if the two are the same, accept the package, do further processing, or discard the package. This type of transmission is called "broadcast" (broadcasting).
With the definition of the packet, the MAC address of the NIC, the way the broadcast is sent, the link layer can transfer data between multiple computers.
Four, the network layer
4.1 The origin of the network layer
Ethernet protocol that relies on MAC addresses to send data. Theoretically, relying solely on the MAC address, Shanghai's network card can find the network card in Los Angeles, technically can be achieved.
However, there is a major drawback to doing so. Ethernet uses broadcast to send packets, all members of a "package", not only inefficient, but also confined to the sub-network of the sender. In other words, if two computers are not on the same subnet, the broadcast is not passed. This design is reasonable, otherwise every computer on the Internet will receive all the packages, which will cause disaster.
The internet is a giant network of countless sub-networks, much like the idea that computers in Shanghai and Los Angeles will be on the same subnet, which is almost impossible.
Therefore, you must find a way to differentiate which MAC addresses belong to the same subnet and which are not. If it is the same subnet, it is sent by broadcast, otherwise it is sent by "route" mode. ("Routing" means the distribution of packets to different sub-networks, which is a large topic that is not covered in this article.) Unfortunately, the MAC address itself cannot do this. It is only relevant to the vendor, regardless of the network in which it is located.
This led to the birth of the "network layer". Its role is to introduce a new set of addresses that allow us to distinguish whether different computers belong to the same subnet. This set of addresses is called "Network Address", referred to as "url".
Therefore, after the "Network layer" appears, each computer has two kinds of addresses, one is the MAC address, the other is the network address. There is no connection between the two addresses, the MAC address is bound on the network card, the network address is assigned by the administrator, they are only randomly grouped together.
The network address helps us determine the subnet where the computer resides, and the MAC address sends the packet to the destination network card in that subnet. Therefore, it is logically inferred that the network address must be processed before the MAC address is processed.
4.2 IP protocol
The protocol that specifies the network address is called the IP protocol. The address that it defines is called an IP address.
At present, the widely used is the fourth edition of IP protocol, referred to as IPV4. This version stipulates that the network address consists of 32 bits.
In practice, we use a decimal number divided into four segments to represent the IP address, from 0.0.0.0 to 255.255.255.255.
Each computer on the Internet will be assigned an IP address. This address is divided into two parts, the previous part represents the network, and the latter part represents the host. For example, the IP address 172.16.254.1, which is a 32-bit address, assuming that its network portion is the first 24 bits (172.16.254), then the host part is the last 8 bits (the final 1). Computers in the same sub-network, their IP address must be the same network part, that is, 172.16.254.2 should be in the same subnet as 172.16.254.1.
However, the problem is that we cannot judge the network part simply from the IP address. Or take 172.16.254.1 as an example, its network part, in the end is the first 24 bits, or the first 16, or even the top 28, from the IP address is not visible.
So, how can you tell whether two computers belong to the same subnet from an IP address? This will use another parameter, "Subnet mask" (subnet mask).
The so-called "subnet mask" is a parameter that represents the characteristics of a sub-network. It is formally equivalent to an IP address, is also a 32-bit binary number, its network portion is all 1, the host part is all 0. For example, IP address 172.16.254.1, if the network portion is known as the first 24 bits, the host part is the last 8 bits, then the subnet mask is 11111111.11111111.11111111.00000000, written in decimal is 255.255.255.0.
Knowing the "subnet mask", we can determine whether any two IP addresses are in the same sub-network. The method is to use the two IP address and the subnet mask for each and operation (two digits are 1, the result of the operation is 1, otherwise 0), and then compare the results are the same, if so, it indicates that they are in the same sub-network, otherwise it is not.
For example, the subnet masks for known IP addresses 172.16.254.1 and 172.16.254.233 are 255.255.255.0, are they on the same subnet? Both and operations are performed separately with the subnet mask, and the results are 172.16.254.0, so they are on the same subnet.
To summarize, the IP protocol has two main functions, one is to assign an IP address to each computer, and the other is to determine which addresses are in the same subnet.
4.3 IP packets
The data that is sent according to the IP protocol is called an IP packet. It is not difficult to imagine that it must include IP address information.
But as mentioned earlier, the Ethernet packet contains only the MAC address, and there is no field for the IP address. Do you need to modify the data definition and add a field?
The answer is no, we can put the IP packet directly into the "data" part of the Ethernet packet, so there is no need to modify the Ethernet specifications at all. This is the benefit of the hierarchical structure of the Internet: changes in the upper layers do not involve the underlying structure at all.
Specifically, IP packets are also classified as "header" and "data" two parts.
The "header" section mainly includes the version, length, IP address and other information, the "Data" section is the specific content of IP packets. When it is placed in an Ethernet packet, the Ethernet packet becomes the following.
The "header" portion of an IP packet is 20 to 60 bytes long, and the total length of the packet is up to 65,535 bytes. Therefore, in theory, the "data" portion of an IP packet is up to 65,515 bytes in length. As mentioned earlier, the "data" portion of an Ethernet packet is only 1500 bytes long. Therefore, if the IP packet exceeds 1500 bytes, it needs to be split into several Ethernet packets, which are sent by sub-development.
4.4 ARP Protocol
There is one last point you need to explain about the network layer.
Because the IP packet is sent in the Ethernet packet, we must also know two addresses, one is the other's MAC address, the other is the other's IP address. Normally, the IP address of the other party is known (explained later), but we do not know its MAC address.
So, we need a mechanism to get the MAC address from the IP address.
This can be divided into two different situations. In the first case, if the two hosts are not in the same sub-network, then in fact there is no way to get the other's MAC address, only the packet to the two sub-network connection "gateway", let the gateway to handle.
In the second case, if the two hosts are on the same subnet, then we can use the ARP protocol to get the MAC address of each other. The ARP protocol also emits a packet (contained in an Ethernet packet) that contains the IP address of the host to which it is queried, in the other's MAC address column, filled with FF:FF:FF:FF:FF:FF, indicating that this is a "broadcast" address. Each host of its subnet receives the packet, which takes the IP address and compares it to its own IP address. If the two are the same, make a reply, report their MAC address to each other, or discard the package.
In short, with the ARP protocol, we can get the host MAC address of the same sub-network, can send packets to any host.
V. Transport Layer
5.1 origin of the transport layer
With the MAC address and IP address, we can already establish communication on any two hosts on the Internet.
The next problem is that there are many programs on the same host that need to use the network, for example, while you're browsing the web and chatting with your friends online. When a packet is sent from the Internet, how do you know whether it represents the content of a Web page or the content of an online chat?
In other words, we also need a parameter that indicates which program (process) The packet is intended to use. This parameter is called "Port", which is actually the number of each program that uses the NIC. Each packet is sent to a specific port on the host, so different programs can take the data they need.
The "Port" is an integer between 0 and 65535, exactly 16 bits. 0 to 1023 of the ports are system-occupied, users can only choose a port greater than 1023. Whether you are browsing the Web or chatting online, the application randomly selects a port and then contacts the appropriate port on the server.
The function of the "Transport layer" is to establish "port-to-port" communication. In contrast, the function of the "network layer" is to establish "host-to-host" communication. As long as the host and Port are determined, we can communicate between the programs. Therefore, the UNIX system puts the host + port, called the socket. With it, you can develop Web applications.
5.2 UDP protocol
Now we have to include the port information in the packet, which requires a new protocol. The simplest implementation is called the UDP protocol, and its format is almost in front of the data, plus the port number.
UDP packets are also made up of "header" and "data".
The "header" section mainly defines the issuing port and the receive port, and the "Data" section is the specific content. Then, the entire UDP packet into the "data" part of the IP packet, and the previous said that the IP packet is placed in the Ethernet packet, so the entire Ethernet packet now becomes the following:
UDP packets are very simple, the "header" section is only 8 bytes, the total length of not more than 65,535 bytes, just put in an IP packet.
5.3 TCP protocol
The advantages of the UDP protocol are relatively simple and easy to implement, but the disadvantage is that the reliability is poor, once the packet is sent, it is impossible to know whether the other party received.
In order to solve this problem and improve the network reliability, the TCP protocol was born. This protocol is very complex, but it can be approximated that it is a UDP protocol with a confirmation mechanism, each sending a packet requires confirmation. If a packet is lost, the acknowledgement is not received and the sender knows it is necessary to re-send the packet.
Therefore, the TCP protocol ensures that data is not lost. Its disadvantage is the complexity of the process, the implementation of difficult, more expensive resources.
TCP packets, like UDP packets, are embedded in the "Data" section of the IP packet. TCP packets have no length limit and can theoretically be infinitely long, but in order to ensure the efficiency of the network, the TCP packet length does not exceed the length of the IP packet, to ensure that a single TCP packet does not have to be split again.
VI. Application Layer
The application receives data from the "Transport Layer", which is then interpreted. Since the Internet is an open architecture, data sources are varied and must be well-defined in advance, otherwise they cannot be interpreted at all.
The role of the "Application layer" is to specify the data format of the application.
For example, the TCP protocol can pass data to a variety of programs, such as email, WWW, FTP, and so on. Then there must be different protocols for the format of e-mail, Web pages, FTP data, and these application protocols constitute the "Application layer".
This is the highest level, directly facing the user. Its data is placed in the "Data" section of the TCP packet. As a result, the current Ethernet packet becomes the following.
At this point, the entire Internet five-storey structure, from the bottom up all finished. This is from a system perspective, explaining how the Internet is constituted. Next, I reverse, from the user's perspective, from the top-down to see how this structure is functioning to complete a network data exchange.
七、一个 Summary
First, make a summary of the previous content.
We already know that network communication is the exchange of data packets. Computer A sends a packet to Computer B, which receives, responds to a packet, and realizes communication between the two computers. The structure of the packet is basically the following:
To send this package, you need to know two addresses:
* Each other's MAC address
* Each other's IP address
With these two addresses, the packet can be sent to the receiver accurately. However, as mentioned earlier, the MAC address has limitations, if the two computers are not on the same subnet, you will not know the other's MAC address, must be forwarded through the gateway.
, computer 1th will send a packet to computer number 4th. It first to determine whether the 4th computer is in the same subnet, the results found not (after the introduction of the method of judgment), so the packet sent to gateway A. Gateway A through the routing protocol, found that 4th computer is located in sub-network B, and the packet sent to Gateway B, Gateway B and then forwarded to computer 4th.
Computer number 1th sends the packet to gateway A, you must know the MAC address of Gateway A. Therefore, the destination address of the packet is actually divided into two situations:
Scene |
Packet Address |
Same sub-network |
Each other's MAC address, the other's IP address |
Non-identical sub-network |
The MAC address of the gateway, the IP address of the other |
Before sending a packet, the computer must determine whether the other person is on the same subnet, and then select the appropriate MAC address. Next, let's see how this process is done in practice.
Eight, the user's Internet settings
8.1 Static IP Address
You bought a new computer, plugged in a network cable, power on, then the computer can surf the Internet?
Usually you have to do some setup. Sometimes, the administrator (or ISP) will tell you the following four parameters, you fill them in the operating system, the computer can connect the Internet:
* The IP address of the machine
* Subnet Mask
* IP address of the gateway
* IP address of DNS
is the Windows System Setup window.
These four parameters are integral and will explain why you need to know them to get online. Because they are given, each time the computer is turned on, it will be assigned the same IP address, so this situation is called "Static IP address Internet".
However, such a setting is professional, the average user is daunting, and if the IP address of a computer remains unchanged, other computers will not be able to use this address, not flexible. For these two reasons, most users use "Dynamic IP address Internet".
8.2 Dynamic IP Address
The so-called "Dynamic IP Address", refers to the computer boot, will automatically assign to an IP address, without human settings. The protocol it uses is called the DHCP protocol.
This protocol stipulates that in each sub-network, one computer is responsible for managing all IP addresses of the network, which is called a "DHCP server". When a new computer joins the network, a "DHCP request" packet must be sent to the "DHCP server" requesting the IP address and the associated network parameters.
As mentioned earlier, if two computers are on the same subnet, you must know the other's MAC address and IP address to send the packet. However, the newly added computer does not know these two addresses, how to send a packet?
The DHCP protocol makes some clever rules.
8.3 DHCP protocol
First, it is an application-layer protocol that is built on top of the UDP protocol, so the entire packet is this:
(1) The first "Ethernet header", set the MAC address of the sender (native) and the MAC address of the receiver (DHCP server). The former is the MAC address of the local network card, the latter do not know, fill in a broadcast address: FF-FF-FF-FF-FF-FF.
(2) Next "IP Header", set the IP address of the sender and the IP address of the receiver. At this time, for both, this machine is not known. The IP address of the issuing party is then set to 0.0.0.0, the IP address of the receiver is set to 255.255.255.255.
(3) The last "UDP header", set the port of the issuing party and the port of the receiver. This section is provided by the DHCP protocol, which is port 68 and the receiver is port 67.
Once this packet is constructed, it can be sent out. Ethernet is broadcast sent, and each computer on the same sub-network receives this packet. Because the receiver's MAC address is ff-ff-ff-ff-ff-ff, do not see who is sent to, so each received this package of the computer, you must also analyze the IP address of the package to determine whether it is sent to their own. When the sender IP address is 0.0.0.0 and the receiver is 255.255.255.255, the DHCP server knows "This package is sent to me" and the other computer can discard the package.
Next, the DHCP server reads out the contents of the packet, assigns the IP address, and sends back a "DHCP response" packet. The structure of this response packet is similar, the MAC address of the Ethernet header is the network card address of both sides, The IP address of the IP header is the IP address of the DHCP server (the issuing party) and the 255.255.255.255 (receiver), the UDP header port is 67 (sender) and 68 (receiver), the IP address assigned to the requester side and the specific parameters of the network are included in the data section.
The newly added computer receives the response packet, so it knows its own IP address, subnet mask, gateway address, DNS server, and so on.
8.4 Internet Settings: summary
In this section, one thing to keep in mind: whether it's a "static IP address" or a "dynamic IP address", the first step in computer surfing is to determine four parameters. These four values are important and worth repeating:
* The IP address of the machine
* Subnet Mask
* IP address of the gateway
* IP address of DNS
With these values, the computer can "surf" the Internet. Next, let's look at an example of how the Internet protocol works when a user accesses a webpage.
九、一个 instances: Accessing Web pages
9.1 Native parameters
We assume that, following the steps in the previous section, the user has set their own network parameters:
* The IP address of the machine: 192.168.1.100
* Subnet Mask: 255.255.255.0
* IP Address of the gateway: 192.168.1.1
* DNS IP address: 8.8.8.8
Then he opens the browser, wants to visit Google, and in the address bar entered the URL: www.google.com.
This means that the browser is sending a Web request packet to Google.
9.2 DNS Protocol
We know that sending a packet must be known to the other's IP address. However, now, we only know the URL www.google.com, do not know its IP address.
The DNS protocol can help us to convert this URL into an IP address. The DNS server is known to be 8.8.8.8, so we send a DNS packet (53 port) to this address.
The DNS server then responds by telling us that Google's IP address is 172.194.72.105. So, we know each other's IP address.
9.3 Subnet Mask
Next, we want to determine whether this IP address is in the same subnet, which will use the subnet mask.
The known subnet mask is 255.255.255.0, the machine uses it to its own IP address 192.168.1.100, do a binary and operation (two digits are 1, the result is 1, otherwise 0), the result is 192.168.1.0; Then Google's IP address 172.194.72 .105 also makes an and operation, which evaluates to 172.194.72.0. These two results are not equal, so the conclusion is that Google is not on the same subnet as the native computer.
Therefore, we want to send a packet to Google, must be forwarded through the gateway 192.168.1.1, that is, the receiver's MAC address will be the gateway's MAC address.
9.4 Application Layer Protocol
The Web page is configured with the HTTP protocol, and the entire packet is constructed like this:
The contents of the HTTP section are similar to the following:
get/http/1.1
Host:www.google.com
Connection:keep-alive
user-agent:mozilla/5.0 (Windows NT 6.1) ...
accept:text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-encoding:gzip,deflate,sdch
accept-language:zh-cn,zh;q=0.8
accept-charset:gbk,utf-8;q=0.7,*;q=0.3
Cookies: ...
We assume that the length of this part is 4960 bytes and it will be embedded in the TCP packet.
9.5 TCP protocol
The TCP packet needs to set the port, the receiver (Google) HTTP port is 80 by default, and the sender (native) port is a randomly generated integer between 1024-65535, assuming 51775.
The header length of the TCP packet is 20 bytes, plus the packet embedded in HTTP, the total length becomes 4980 bytes.
9.6 IP protocol
Then, the TCP packet is then embedded in the IP packet. IP packets need to be set up on both sides of the IP address, which is known, the sender is 192.168.1.100 (native) and the receiver is 172.194.72.105 (Google).
The header length of the IP packet is 20 bytes, plus the embedded TCP packet, the total length becomes 5000 bytes.
9.7 Ethernet Protocol
Finally, the IP packet is embedded in the Ethernet packet. Ethernet packet needs to set the MAC address of both sides, the sender is the local network card MAC address, the receiver is the gateway 192.168.1.1 MAC address (through the ARP protocol).
The data portion of the Ethernet packet, the maximum length is 1500 bytes, and now the IP packet length is 5000 bytes. Therefore, IP packets must be split into four packets. Because each package has its own IP header (20 bytes), the length of the IP packets for the four packets is 1500, 1500, 1500, 560, respectively.
9.8 Server-side response
After the forwarding of multiple gateways, Google's server 172.194.72.105, received the four Ethernet packets.
According to the IP header number, Google put four packages together, take out the full TCP packet, and then read the inside of the "HTTP request", and then make "HTTP response", and then sent back with the TCP protocol.
After the native HTTP response is received, the Web page can be displayed to complete a network communication.
This example ends here, although it has been simplified, but it generally reflects the entire communication process of the Internet Protocol.
Getting Started with Internet protocols