"Reprint" HTTP protocol with a simple understanding of TCP protocol follow-up

Source: Internet
Author: User
Tags ftp file

Writing this long code, I found that the TCP/IP understanding is not very thorough. Although the HttpClient class of C # will be used for network programming, you can also use Chrome's developer tools to detect each HTTP request header and newspaper style, but also know how the cookie exists, but how this data transmission on the network is still very vague, How is the data converted from the client's file or string to a binary number and routed to the server side? In order to understand these problems, recently read the general reading of the TCP-IP (vol. one or two, iii), is more clear than before, the following is the process of reading some of the knowledge points.

First, we need to understand the concept of the computer network layering. The picture below is a classic layered description, and remember that the picture on the textbook was similar to that in college.

But I think that there is an abstract concept in everyone's mind, that is, the layering is vertical, from top to bottom. In fact, I think, more accurately, this stratification should be horizontal, from left to right, like the production line of the workshop, into a large need to deal with raw materials, through different operating stations, a layer of cutting, packaging, to the end of time has become a lot of exquisite small products.

about the network layer.

The network layer has different protocols, such as IP and ICMP, the difference is that the data transmitted from the upper layer according to what format to cut, and then again when the package to follow the guidelines are different.

ICMP is the protocol that the ping command frequently uses. Ping command is not a particularly mysterious thing, is a programmer to write an EXE application, all of your computer console can use this program, because you have installed this EXE on your computer, and in path set the path of the program. The ICMP full name is the Message control protocol. The above image shows that the Application Layer Ping tool, using the ping protocol, jumps directly over the transport layer and calls the ICMP protocol of the network layer. The contents of the ICMP packet are information about the destination host, so it can be used to remotely determine if a host is present on the network. A ping program is a basic tool for testing the connectivity of two systems. It uses only the ICMP echo request and Echo reply message, without passing through the transport layer tcp/udp. Ping servers typically implement ICMP functionality in the kernel.

The accessibility of a host on a network depends not only on whether the IP layer is available, but also on what protocol and port number is used. For example, a host does exist on top of the internet, and a client uses the Ping tool to launch an ICMP protocol packet to the host, which also accurately reaches the host. After the host receives these packets, from the link layer to the network layer to remove the wrapper to parse, but the host's operating system from the network layer to the top of the analysis, found that the ping port is 6666 (assuming that the host closed the port), will not react, and silently the data swallowed. In the client's view, the sent packet is lost, and the host cannot be found.

So, summarize the different possible reasons for Ping: The host is not on the line, such as shutdown or unplug the network cable. There is the network firewall or IP policy, the ICMP packets will be filtered, the ping command can not respond, there are some policies of the host itself, will filter out the ICMP packets.

(The personal feeling operating system and network card is working, all the network data is coming in from a portal, the operating system and network card-related components began to parse these binary data packets from the bottom, the layer of unpacking, assembling, and then analysis, until the IP layer, the IP packet will be analyzed , then the TCP layer analysis, this time to find the concept of port number, then depending on the port number, the data will be stored in different buffer areas, each buffer area belongs to a specified application (the port number as the identity). The final application will read the network data from its own buffer zone. )

A communication mechanism for TCP.

When TCP sends out a segment, it initiates a timer, waiting for the destination to acknowledge receipt of the message segment. If a confirmation cannot be received in time, the message segment will be re-sent. TCP will keep its header and data checked and. This is an end-to-end test and is designed to detect any changes in the data during transmission. If the test and errors are received, TCP discards the segment and does not acknowledge receipt of this segment (expecting the originator to timeout and re-send). Since TCP packets are transmitted as IP datagrams, and the arrival of IP datagrams can be out of order, the arrival of the TCP message segment may also be out of sequence. If necessary, TCP will reorder the received data and hand the received data to the application tier in the correct order.

In addition, TCP does not explain the content of the byte stream. TCP does not know whether the transmitted data byte stream is binary or ASCII, EBCDIC, or other type of data. The interpretation of the byte stream is explained by the application layer of both sides of the TCP connection. This processing of byte streams is similar to the way the UNIX operating system handles files. The Unix kernel does not interpret what an application reads or writes, but instead gives it to the application for processing. For the Unix kernel, it cannot differentiate between a binary file and a text file.

(Here's an aside, it's the ASCII code and the binary file problem.) Finally, the data stored on the computer's hard drive is binary data, so this binary data is how to come, this is a problem. In the case of txt text file, the way to store it is to convert the text content into corresponding numbers according to ASCII code, then save and store it in binary form. However, for files such as word, it is more complex, there are specialized software such as Office to handle, and there are certain algorithms to generate these binary. So that's why word files have to be opened with Office software. Notepad is the operating system comes with, if you use Notepad to open Word, then the Notepad will be based on the way the ASCII code to parse, eventually found that either can not parse out the characters, or parse out the characters are garbled. )

Each TCP segment contains the port number of the source and destination ports, which is used to find the originating and receiving application processes. These two values, together with the source-side IP address and destination IP address in the IP header, uniquely determine a TCP connection. An IP address and a port number are also known as a socket socket.

Since a TCP connection is full-duplex (that is, the data can be delivered simultaneously in two directions), each direction must be closed separately. The principle is that when a party completes its data sending task, it can send a fin to terminate the connection. When one end receives a fin, it must notify the other end of the application layer to terminate the data transfer in that direction several time. Sending fin is usually the result of the application layer closing.

Like Telnet, the earliest design for FTP is for two different hosts, which may run under different operating systems, use different file structures, and possibly use different character sets. However, the difference is that Telnet gain heterogeneity is enforced on both sides using the same standard: the NVT with 7-bit ASCII code. And FTP is another way to deal with the differences between different systems. FTP supports a limited number of file types (A S C II, binary, and so on) and file structure (for byte streams or records).

What is the difference between the data in a form form and the uploaded file data in an HTTP request?

form data is converted into binary based on ASCII code, while uploading a file is a direct read of binary data on the computer's hard disk. For example, to upload a Word file, the server will receive a large piece of binary data. In fact, when the file is stored in the client is a large segment of binary code, then how this binary code is generated? Then you have to ask Microsoft Office client, is it generated in a certain way binary code and then exist on the hard disk. So, this is why, an EXE generated file additional EXE can not open, because the use of the decoding method is not the same, do not know how to analyze such a large heap of binary code, and then generate the need to display the string to the user.

Port number, not a real entity, or there is a port on the NIC what. In fact, the port number is a simple digital ID, used to distinguish between different applications, a bit similar to the application's ID, because the network data reached the top of a host, how to know the data is to which application, this time the port number is working. As noted earlier, TCP and UDP use a port number of 16bit to identify applications. So how are these port numbers chosen? Servers are generally identified by well-known port numbers. For example, for each TCP/IP implementation, the TCP port number for the FTP server is 2 1, and the TCP port number for each Telnet server is 2 3, and the UDP port number for each TFTP (Simple File transfer) server is 69.

The client usually does not care about the port number it is using, just make sure that the port number is unique on this machine. The client-side slogan is also known as a temporary port number (that is, there is a short time). This is because it usually exists only when the user runs the client, and the server runs as long as the host is open.

The network layer (IP) provides point-to-point services, while the Transport layer (T C p and u D p) provides end-to-end services.

In the TCP/IP protocol family, the network layer IP provides an unreliable service. In other words, it simply sends the packet from the source node to the destination node as quickly as possible, but does not provide any reliability guarantees. TCP, on the other hand, provides a reliable transport layer on an unreliable IP layer. To provide this reliable service, TCP uses a mechanism such as time-out retransmission, sending, and receiving end-to-end acknowledgement packets. Thus, the transport layer and the network layer are responsible for different functions respectively.

Never understand why the IP layer is not reliable, and TCP is based on the IP, but is reliable? Because some redundant operation is done to ensure reliability. The two interactive applications, Telnet and rlogin, require minimal transmission latency because they are used primarily to transmit small amounts of interactive data. FTP file transfers, on the other hand, require maximum throughput.

The same HTML page, sent from the server side to the client browser, the first is based on the HTTP protocol, assemble the string, assembled into a request reply, the reply string includes header,body and so on. The string is then converted into binary data, which is then decomposed to the TCP layer, which is then handed over to the IP layer and disassembled into multiple IP packets. At this time these packets are unordered, not necessarily which package arrives first. Eventually these packages are then composed of files, such as the Img,css,js file. This is why the picture is rendered in a different order.

The next layer of the IP layer is the data link layer, which we can also understand as an Ethernet layer or a token network. When a host sends an Ethernet data frame to another host on the same LAN, the destination interface is determined based on the Ethernet address of the 48bit. The device driver never checks the destination IP address in the IP datagram. ARP provides dynamic mapping between the IP address and the corresponding hardware address. The reason we use the word dynamic is that the process is done automatically, and the general application user or system administrator does not have to care.

The data frame exchange at the hardware level must have the correct interface address. However, TCP/IP has its own address: the IP address of the + bit. Knowing the host's IP address does not allow the kernel to send a frame of data to the host. The kernel, such as the Ethernet driver, must know the hardware address of the destination to send the data. The function of ARP is to provide dynamic mapping between 32bit IP addresses and hardware addresses with different network technologies.

Gets the ASCII code of the string

String A = "Hello World";

byte[] data = Encoding.ASCII.GetBytes (A);

An HTTP request, a TCP connection is established, then the content is cut, packaged, and finally sent to the server.

In the past, there was a problem that there was a pipeline between A and B, which was always felt to be TCP communication. If a is sending a message, B also sends a message, then the content does not conflict in the pipeline. But the idea is wrong. There is no pipeline between A and B, which is the routing method of the IP layer to transform the packet, and neither the sender nor the receiver has the specified route at all. Send and receive are in a different buffer, the general sender of the message will be sent in the content of an identifier, tell the receiver this batch of data sent out, you go to deal with it, finished to give me a reply.

When we write code, there is a read method of reading the network data, I used to think is to go to the network data. This is wrong, this read, is to read from the buffer box has been removed by the operating system or network card and restored data, the data read into the program's memory.

Why does TCP make connections cost?

This is not to say that to occupy a lot of bandwidth on the Internet, the cost here is mainly refers to the consumption of resources on the computer. When establishing a TCP connection, the computer will do a lot of preparation work, set up the corresponding buffer area, based on the port number to establish the storage area, there is the IP is not reliable, TCP to find a way to store some additional things to ensure reliability, this is the overhead.

Or that sentence, the establishment of a TCP channel, in fact, there is no channel, go IP routing, the establishment of the channel is mainly in the computer memory to open up the corresponding space. A TCP connection persists, indicating that the corresponding cache area has not been reclaimed.

How is a TCP connection established between A and B?

This involves a 3-time handshake mechanism. Because a program on the B machine monitors all IP packets at all times, once the contents of the 3 handshake are detected, a connection is opened and a TCP connection is eventually established through mechanisms such as authentication.

"Turn from" http://blog.csdn.net/sundacheng1989/article/details/52437128

"Reprint" HTTP protocol with a simple understanding of TCP protocol follow-up

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.