TCP/IP and HTTP are not understood. After reading this article, the network knowledge is all understood
Preface
Before writing, first give this article a clear positioning, after reading this article, I hope you can:
For the computer network has a preliminary understanding and understanding of some classic terminology, such as three times handshake, four waves, the meaning of DNS resolution.
Learn about application-layer protocols such as the traditional HTTP, https protocol, and the HTTP2, Quic protocol that the industry has been gradually gaining in recent years.
Through the actual production environment examples, understand the network optimization in the project and the actual significance of the effect. pre-class preparation
In order to be able to better understand the content of this article, it is recommended to read the following several preparations:
Understand the basic concept of TCP/IP, recommended to read the "TCP/IP protocol in detail" in the 3rd and 17th chapters, here to you a link, free download, portal (http://download.csdn.net/download/u012155923/10046395).
Secondly, in the course of learning, in order to be able to see the effect better, please download the grab kit, recommended here is Wireshark (https://www.wireshark.org/download.html) and Charles (https:// www.charlesproxy.com/download/), everyone can download, combine the article content and try to grab the package, easier to understand the entire network forwarding process. Network Model
Before learning the specific knowledge, it is very important to understand the knowledge system and model, for network knowledge is also the case, there are two types of network models, one is the OSI seven layer model, the other is the TCP/IP five layer model, see the following diagram:
As you can see, there is a correspondence between the OSI seven layer model and the TCP/IP five layer model, and the transport layer is exactly the same (the network interface layer in the TCP model is the collection of the data link layer and the physical layer), so it can be said that the session layer, presentation layer and application layer in the OSI model are merged into tcp/ After the application layer in the IP model, the two are basically the same.
The above two network models belong to the general network model, relatively speaking, the TCP/IP model is more common, so we also mainly with the TCP/IP model for the network model, which is why this class name TCP/IP origin.
So what protocol does each layer correspond to? Take a look at the following picture:
You can see that we know some of the protocols, IP protocol is located in the network layer, the TCP protocol is located in the transport layer, and the HTTP protocol is located in the application layer, the rest of the more familiar DNS protocol, FTP protocol, etc., all have its own hierarchy.
We can verify this with the Wireshark capture package and grab an HTTP message at random:
From the top down to the frame frame head, Ethernet frame head, IP protocol header, TCP protocol header and HTTP protocol header, the last line is the data of this request, the format of the JSON three-grip four-wave
The so-called three-grip four-swing refers to the three-time handshake and four waves, that is, the TCP protocol to establish the connection and disconnection process, the reason is called three handshake, because the two sides of the connection need to go through three data interaction to complete the establishment of the connection, the same, four waves are disconnected when the connection requires four data interaction, The interactive process diagram is as follows:
To give a simple example, two people small s and small C call, their three handshake to establish the connection process is:
Little s: Hello, is it little c?
Little c: Well, yes, you are little s.
Little s: Yes, let's start chatting happily.
The process of waving four times is:
Small s: Hello, little C, I'm a little tired, today or so
Little c: Well, you take a break, I'll say two more words.
Little c: Oh, I'm so tired, I'll be here today
Little s: OK, that's it, 886.
Then small s and small c hung up the phone, we noticed that in the process of four waves, small s first proposed disconnection, but in fact their dialogue did not end, after the small C confirmed the message, and did not immediately disconnect, but continue to dialogue, because the TCP protocol has full-duplex characteristics, Simple point is a connection, there are small c--small s and small s to small c two lines, and small s and small c to confirm the closure is only small s--small C this line, so small c can continue to small s message, until the small c also feel to close the connection and by small s to confirm, the two people all the connection is completely closed.
So you're sure to ask why TCP is designed to do this because TCP is a full-duplex protocol, full Duplex is a term for communication transmission. Communication allows data to be transmitted simultaneously in two directions, we also mentioned in the above example that in a TCP interaction, two lines need to be maintained, so that both lines are in the correct state, both during setup and disconnection.
The above explanation is still based on the theoretical stage, in order to better consolidate knowledge, we use Wireshark in the actual production environment to grasp the package to see:
Client IP is: 10.2.203.93
Server IP: 10.108.21.2
When the client and the server are connected, the packet is caught:
Can see by the client first sends the SYN message, the service side receives and responds the SYN ACK message, the client finally returns an ACK message, the connection even if establishes completes.
Let's look at what happens when the connection is disconnected:
And when the connection is established, the initiator of the disconnection is the service side, you can see the service side send fin message, and then the client sends an ACK message, the service side will no longer transmit data to the client, and the client after the completion of data transmission, also send fin message to the server, on the receiving end of the server After the ACK message is formally disconnected.
About the three-grip four-shake when the TCP connection is established and disconnected the first thing to do here is to attach a TCP state migration diagram, which is helpful for understanding the entire TCP protocol:
DNS Resolution
DNS (domain Name System), a distributed database of domain names and IP addresses that are mapped to each other on the Internet, makes it easier for users to access the Internet without remembering the number of IP strings that can be read directly by the machine. The process of obtaining the IP address of the host name through the hostname is called Domain name resolution (or hostname resolution).
This is a section of the Baidu Encyclopedia from the description, the simple point is that the DNS to do the work is, let us remember, the more memorable domain name converted to an IP address of a system, the following we will rely on Wireshark to see how it works.
When we enter www.baidu.com in the browser, we send a DNS request message to the server, and when the server finishes processing the request, a DNS response message is sent, including the IP address we care about, and we can see that we have two messages, which we call the DNS request message, the DNS response message. , pay attention to our filter conditions, through the UDP port to filter more convenient:
First look at the DNS request message:
You can see that the Transport Layer protocol for DNS is UDP, not TCP, and its port number is 53. The next is the transaction ID (2 bytes), which can be used as a unique ID for a DNS request, meaning that the ID is the same for a request and response message, so you can also use this ID to find the response message corresponding to the request message.
The Flags field is also 2 bytes long, and you can see that the 16bit is divided into the following sections, in turn:
Response (1-bit), the value of 0 is a DNS request message, and 1 is a DNS response message
OpCode (4-bit): Defines the type of query or response (if 0 is standard, if 1 is reversed and if 2 is a server status request).
AA (1-bit): Authorized to answer the flag bit. This bit is valid in the response message, 1 indicates that the name server is a rights server
TC (1-bit): Truncate flag bit. 1 indicates that the response has exceeded 512 bytes and has been truncated
RD (1-bit): This bit is 1 to indicate that the client wants to get a recursive answer
RA (1-bit): can only be set to 1 in the response message, indicating that a recursive response is possible.
Zero (3-bit): Do not say and know it is 0, keep the field.
Rcode (4-bit): Return code, indicating the error state of the response, usually 0 and 3, the values are as follows:
0 ERROR-Free
1 Format errors
2 issues on the domain name server
3 Domain Reference issues
4 Query type not supported
5 is forbidden in the administration
6-15 reserved
The following flags are followed by several fields: queries, answers, Auth_r, ADD_RR, and their corresponding Chinese meanings are the number of questions, the number of resource records, the number of authorized resource records, and the number of additional resource records, all of which are 2 bytes in length, generally queries to 1, The remaining field values are 0.
Next is the body of the message, which includes the domain name to query, query type and the corresponding query class, here the format of the domain name is more special, The domain name here is www.baidu.com, and the part marked Blue is the expression in the message, you can see that 03 is 3 bytes, and followed by 3 77, if converted to ASIC code, is 0x77, so for www.baidu.com, the first is to "." As a delimiter, divided into 3 parts, with the corresponding length of section plus the domain name of the ASIC to form a segment, which constitutes a complete domain name.
The next two fields are type and class, where both fields are 1, where type a means that the request type is the IP address obtained through the domain name and is the most common form of DNS request. The class field is 1, which means that the data queried here is Internet data and is the most common form.
The request message is finished, and then the response message is read:
The same part of the response message and reply message is no longer mentioned, you can see that the response value in the flags is 1, indicating that this is a response message, while the transaction ID is also the same as the ID in the request message, indicating that this is the response message corresponding to the above request message.
The main part of the request message body is the answers field, which includes the IP address we want, but we also notice that for the domain name of www.baidu.com, there are 3 response fields, then which one will prevail. We look at all the strips.
The first is the 1th answer, where the type is a CNAME, where the CNAME indicates that the response is an alias for the domain name queried in the request message, That is, the return will be an alias of www.baidu.com, that is, www.a.shifen.com, followed by two answer,type type A, the return value will be a IPV4 address, the other more common type types and aaaa-- IPV6 address, Ptr--ip address converted to domain name, ns--name server.
As can be seen, for the same domain name, you can return multiple IP addresses, in the above response message, the return of 2 IP addresses, is 61.135.169.125 and 61.135.169.121, which is what we end up wanting, in order to prevent an exception in one of the IP address, so usually for a domain name, there will be two or more of the IP address corresponding to it, so that it can play a primary disaster recovery effect, when one of the IP address cannot be connected , you can also switch to another IP for access. Enter in the browser or 61.135.169.121, you can also access the page normally:
For those who use chrome, you can enter the chrome://net-internals/#dns to view a list of browser DNS resolutions:
Here you can see the site you visited, as well as the corresponding parsing records, there is a column of TTL, which represents the survival of the results of the domain name resolution