Web (5)

Source: Internet
Author: User

HTTP underlying protocol

1. The Internet protocol family consists of four main layers

A. Link Layer: process hardware interface details of physical communication media, such as Ethernet and asynchronous transmission mode (ATM)

B. Network Layer: processes each packet (IP packet) transmitted over the network. The network layer protocol is implemented in routers and terminal hosts.

C. Transport Layer: the transport layer coordinates inter-host communication at the application layer. In actual applications, the transport layer protocol is generally implemented in the terminal host operating system.

D. Application Layer: the application layer processes the details of a specific application. In actual applications, application layer protocols are generally implemented as part of the application software (such as WEB browsers or WEB servers)

2. Three main Protocols involved in HTTP message transmission, which start at the network layer:

A. Internet Protocol (IP): IP is A network layer protocol. It coordinates the transmission of packets (Information Units) between A host and another host based on the IP address of the target host. The IP protocol is built on the link layer. It does not consider the hardware technology of the link layer.

B. Transmission Control Protocol (TCP): TCP is a transport layer protocol used to coordinate the transmission of IP data packets, so as to provide a reliable bidirectional abstract connection between two communication applications. Although some applications use User Datagram Protocol (UDP), the main transmission protocol on the Internet is TCP.

C. Domain Name System (DNS): DNS is an application layer protocol that controls host name conversion (for example, converting www.foo.com to an IP address, and vice versa ). DNS provides this common service for many applications

3. Internet Protocol (IP)

An IP address provides a framework for sending data packets. A packet is an information unit-a certain amount of data (in bytes) specified by the sender ). A message is divided into multiple data packets. The router on the Internet processes each packet independently and does not need to maintain the status of the successive packets. A series of IP packets transmitted from one host to another do not necessarily traverse the same network path. Packages may be lost, damaged, or not transmitted in order.

The terminal host implements the transport layer protocol on top of the IP address. These protocols coordinate data distribution between applications. The two main transmission protocols are: Transmission Control Protocol (TCP) and User Datagram Protocol (UDP ). Both protocols have been standardized and implemented in many operating systems. TCP provides the master abstraction required by most Internet applications-logically and reliably transmits a sequence of bytes from the sender to the receiver. UDP provides a simple abstraction of unreliable datagram distribution. Send an application to guide the operating system to send a group of bytes to the remote application. UDP datagram is sent to the receiver in the IP packet. IP packets can be lost or delayed on the network. If necessary, the sender application executes the retransmission loss datagram. UDP is most suitable for applications that can tolerate certain data loss. For example, most multimedia applications use UDP packet streams to transmit audio and video content. packet loss may reduce the audio or video quality of the receiver, but the application can still play.

4. IP Address

It is a 32-bit number (4 bytes) divided into two parts: the network number and the host number. The Internet routing is based on the network number, and then the packet is allocated to the corresponding network, finally, the IP address sent to the target host based on the host number is placed in the IP packet header, including the source IP address, target IP address, and other important information.

5. Transmission Control Protocol

The Transmission Control Protocol (TCP) coordinates data transmission between a pair of applications. Applications communicate through read/write sockets, which represent data as ordered and reliable byte streams.

6. Socket Abstraction

Socket abstraction provides reliable two-way channel communication between two applications. Applications generally run on different machines. This abstraction is provided by the Transport Control Protocol (TCP, this Protocol is generally implemented by the operating system of each host connected to the internet. The application uses TCP by creating a socket (similar to opening a file. Both endpoints must identify the exact method of the socket. It is not enough to know the IP addresses of two machines. A single machine can run multiple applications. A single application (such as a WEB server) can have multiple sockets, therefore, each socket is related to a port number of each endpoint. The port number is a 16-digit integer with a range of 0-65535,1024 and below are well-known ports which are reserved for specific application layer protocols (such as HTTP port 80) and other port numbers (1024-65535) can be used by any application.

The socket is identified by five pieces of information-two IP addresses (machines used to run the two applications), two port numbers (used for two application endpoints), and the Protocol (TCP)

Socket = (IP Address: port number)

Applications create sockets through system calls implemented by the operating system. Assume that application A (such as a web client) establishes A socket for Remote Application B (WEB Server). Application A initializes and creates A socket by calling the system call:

1) In a UNIX operating system, the socket () function is used to create a new socket,

2) then the application calls connect () to associate the socket with the IP address of application B and the port number of application B.

3) during the connect () call, the operating system will also select the unused local port number (1024-65535) for application A. At this point, the operating system running application A knows two IP addresses and two port numbers, which uniquely identify the two-way connection between the two applications.

4) the operating system initializes the TCP connection to application B.

5) after A connection is established, connect () calls and returns data. Application A can start to read/write data to the socket.

However, when establishing a socket, application B initially played a negative role. Application B listens to connection creation requests on a specific port number. In UNIX, this includes creating socket () and calling bind () to allocate local ports (such as product 80 ), then call the listen () function, which means that B wants to wait for the connection of the Remote Application. This will start the operating system to respond to any TCP connection request from this port, the application calls accept () to learn about these new TCP connections. By default, the accept () call remains in the waiting state before the new connections are unavailable. Once the connection is available, the system call completes the creation of the socket, so far application B can start to read/write data to the socket.

// Once the connection is established, the application reads/writes data. In fact, the two applications can read and write at the same time, because the socket provides a two-way communication channel. On the WEB, the client (application A) initializes the communication, the client writes the HTTp request to the socket, and the server (application B) waits for the data to reach its socket. Once the data arrives, the server reads the HTTP request from the socket. After processing the request, the server writes the HTTP Response to the socket, while the client waits for the data to reach its socket, and then the client reads the response from the socket. This mode is a typical client/server application.

The operating system processes the logical connection between two applications and coordinates the details of IP packet transfer:

1) For A, the operating system executes the socket () and connect () functions. If the remote host does not respond, the operating system notifies application A that the request fails to create A socket.

2) for B, the operating system executes socket (), bind (), listen () and accept ()

3) In two applications, the operating system coordinates packet sending and receiving to create an orderly and reliable byte stream abstraction.

7. Sliding Window Flow Control

The TCP sender restricts data transmission to avoid overflow of the buffer space of the receiver. Theoretically, TCP can transmit data whenever the application writes data to the socket. However, TCP restricts data transmission mainly because:

1) The volume of data transmitted by the sender should not exceed the buffer size of the receiver.

2) the data transmission speed of the sender should not exceed the network processing speed ---- sending too fast can overload the network, leading to network congestion and increasing the possibility of communication delay and data packet loss. Each TCP sender uses Sliding Window Flow Control to limit data transmission.

8. retransmission of lost packets

Retransmission loss packets play an important role in TCP's reliable transmission of byte streams. The IP address does not notify the TCP sender of the packet loss time. On the contrary, the sender must infer that the packet has been lost according to the response of the receiver (or the response is missing.

9. TCP congestion control

The TCP sender adjusts data transmission according to the sliding window. The sliding window depends on both the available buffer space of the receiver and the available bandwidth of the network. The two factors are represented by the receiver window and the congestion window respectively. The sender transmits data based on the minimum values to avoid buffer overflow and network congestion. The IP packet is lost due to congestion links. When packet loss is detected, the sending opportunity reduces the size of the congestion window and the transmission rate. When packet loss is missing, the TCP sender gradually increases the congestion window, this allows you to transmit data quickly.

10. Four application-layer protocols that come before the WEB

A. Telnet allows you to connect to an account on A remote machine. The client program running on the user machine uses the Telnet protocol to communicate with the server program running on the remote machine.

B. File Transfer Protocol (FTP) FTP allows users to copy files to a remote machine and copy files from a remote machine. The client program sends commands to the server program, which means that the user coordinates file replication between two machines.

C. simple Mail Transfer Protocol (SMTP) SMTP supports sending e-mail messages. SMTP is used to send e-mail messages from the local mail server to the remote mail server. In addition, SMTP is also used to send e-mail messages from the user's mail proxy to the Local Mail Server

D. The Network News Transmission Protocol (NNTP) NNTP supports the transmission of articles related to electronic news groups. The user agent uses NNTP to communicate with the local news server.

11. SSL

Secure Sockets Layer (SSL) is a protocol between the transport layer and the application layer. The main purpose of SSL is to allow the client to use the TCP/IP protocol as its transport layer connection, so that binary data can be securely transmitted to servers that understand SSL. Security is derived from the encrypted connection established between a verified client and the server. After a message is encrypted, only one party who knows how to decrypt it can read it in the form of a local machine. to convert the plaintext into the so-called ciphertext, an algorithm and a key are required. For the SSL protocol, the two parts are record and handshake. The handshake protocol is similar to the TCP handshake protocol, and its function is to establish a connection. In the handshake phase, the sub-protocol is used to establish the record layer and use it to verify the endpoint. During the handshake, the format used for data exchange is determined by the record protocol, which is responsible for processing the necessary encryption, compression, and re-assembly. The important contribution of the record protocol is that the communication parties can be confident that the data to be exchanged will be encrypted, and the integrity of the message will be maintained. The record Protocol divides the data into blocks not greater than 16 KB. You can also compress the data before performing the integrity check, encrypt the results, and then send the data with a header. When a message is returned from the WEB server, the steps are the opposite: decryption, integrity check, optional decompression, and sending to the other party

12. Bandwidth Optimization

In order to save bandwidth, people have developed various methods. In these methods, resources are converted, streamlined, or not sent at all. The three changes related to making full use of bandwidth and HTTP are as follows:

A. Scope Request Mechanism

If you do not need to transmit the entire resource (for example, if you are only interested in a part of the resource), you can save the bandwidth. The Protocol needs to support some methods to specify the required part of the resource and transmit it to them.

In the process of downloading a resource, for some reason, the connection may be interrupted in the lower half. Then, a new request is sent to the resource again, however, this request may cause downloading all resources (from the beginning), which wastes bandwidth and increases the waiting time. Therefore, there is a need for such a mechanism to support a certain part of the resource requests rather than all. The Range header is introduced in HTTP/1.1 to specify the resources required for this request, so as to implement the Range request mechanism.

B. Expectation/continuation Mechanism

If the sender knows in advance that the receiver cannot process the message body, it is best for the sender not to send resources at all. By exchanging some control information, the sender can verify the situation to avoid unnecessary data transmission.

If the HTTP server cannot process a large number of requests (for example, submitting a large PUT or POST form), it is very useful for the client to know this before the request is sent. Before sending a large request, if the client knows that its requirements can be met, this will benefit the client. The Expect Mechanism provides a perfect solution. It allows the client to know whether the server can meet the expectations of clients with specific requests. If the server can meet the expectation and process the request, it can first notify the client by sending only the 100 Continue response, without sending the response body. After the client receives the 100 Continue response, you can continue sending (large) entity entities on an opened connection. If the server cannot process the request, an appropriate response code will be sent to the client based on the specific situation of the request. For example:

1) if the Request is Too long, the server sends the 413 Request Entity Too Large

2) If you want to prohibit the client from sending such a request, the server will send 403 Forbidden.

3) if the server receives an unexpected secondary CT weight, or it knows that the upstream server cannot handle the secondary CT mechanism, it will send a 417 Expectation Failed status code response.

The secondary CT mechanism is based on road sections. All Expectations can only be pinned on the next road section. If the server of the next road section cannot meet the expectations, the server should return the 417 Expectation Failed status code. However, the reverse CT request header is an end-to-end header. Because the following segment server (such as the proxy) cannot process the request itself and thus forwards the request, the reverse CT Request Header must also be sent at the same time.

C. Compression Technology

Convert the resource before sending, and then refactor it on the receiver. Converting before sending can effectively reduce the resource size.

Before sending a response, consider compressing it by the original server. Similarly, clients that contain large entity entities in the request can also compress the entity.

13. Connection Management

TCP is used as the transmission protocol, and the connection is established through three handshakes (the start phase of the slow speed), and the connection is closed with four other data packets (4 handshakes ), it takes time and resources to establish and close connections. for short-term connections that are common in HTTP message exchange, [close once used up, and re-establish connections for subsequent requests ], the utilization of TCP connections is not fully optimized.

The solution to this problem is to keep the TCP connection persistent, and subsequent requests can continue to be used without re-establishing the TCP connection.

A. HTTP/1.0 Connection: Keep-Alive mechanism

The principle of Keep-Alive is similar to that of persistent connections in HTTP/1.1. The client interested in keeping the connection open will actually request the original server not to close the connection. The specific implementation is as follows:

GET/home.html HTTP/1.0

...

Connection: Keep-Alive

If the server is also interested in enabling the connection, it will send the following response: HTTP/1.0 200 OK...

Connection: Keep-Alive

...

<Response body>

However, if the HTTP/1.0 server is transmitting dynamic content, the receiving client cannot detect the end Of the response if the server does not close the connection. Because the client latency is increased when the Length of the Content is calculated, the dynamically generated Content usually does not include the Content-Length header.

B. Persistent connections in HTTP/1.1 have three main objectives:

1) reduce the cost of TCP connections (less establishment and removal)

2) reduce latency by avoiding multiple TCP Slow Start stages

3) avoid wasting bandwidth and reduce overall congestion

14. message transmission

An important goal of HTTP message exchange is to ensure that participants recognize that they have received the complete message. The response length is a very useful indicator based on which the receiver can know when a complete response is received. The only mechanism that the HTTP/1.0 original server can use to specify the Length of an object body is through the Content-Length field. The length of a static resource can be easily determined (usually an operating system call can be executed). However, for a dynamically generated response, the actual length is calculated, you have to wait until it is completely generated. The Content-Length header field can be entered correctly only after the entire response is generated. Therefore, unless this header field is correctly filled in, the original server will not start sending a response, which requires the entire response to be cached, thus increasing the end user's latency. In HTTP/1.0, the server closes the connection to indicate the end of the dynamic content. If closing the connection is the only way to indicate the end of the response, the persistent connection is impossible. Therefore, in HTTP/1.1, the chunked transmission encoding method is introduced, this solves the basic problem of secure message transmission.

Chunked Transmission Encoding Method: This method allows the sender to split the message subject into chunks of any size and send them separately. The length of the block is added before each block so that the receiver can ensure that the block is fully received. More importantly, at the end of the message, the sender generates a block with a length of 0, and the receiver can determine that the entire message has been safely transmitted.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.