Principle Analysis of HTTP protocol

Source: Internet
Author: User
Tags emit http authentication
One: The origin of http:

The OSI model divides the network communication into seven layers: physical layer, Data link layer, network layer, transport layer, Session layer, presentation layer and application layer, it is easy to understand that the network is divided into five layers for the development of network application personnel. These five layers are: physical layer, Data link layer, network layer, Transport layer and application layer (top level), below is a picture of network layering (from Network):


Computers in the network communicate with each other is to achieve the layer and layer of communication, to achieve the level of communication between layers, then each layer must comply with the rules, so as to complete better communication, we will be the rules of compliance between them is called "agreement", but the network on the five layer of compliance between the agreement is different, each layer has its own agreement. The following is the first to tell about each layer of the agreement.

Physical Layer : the physical layer is the lowest level in the five layer model, the physical layer provides the transmission media and the interconnection equipment for the data communication between the computer, provides the reliable environment for the data transmission, the media includes the cable, the optical fiber, the wireless channel and so on, the interconnection device refers to the computer and the modem interconnection equipment such as a variety of plugs, sockets and so on. The function of the layer is to transparently transmit the bit stream (i.e. binary stream), providing a physical connection for the data link layer to transmit the original bit stream.


Data Link Layer : The Data Link layer is the 2nd layer in the model, this layer is connected by the physical layer of the transmission of the bit stream to the packet, a group of electrical signals constitute a packet, called "frame", the data link layer is to transmit the "frame" unit of the packet, the data passed to the previous layer (network layer), Frame data consists of two parts: frame head and frame data, frame head including the physical address of the receiver (that is, the address of the network card) and other network information, the frame data is the data body to be transmitted. The maximum number of data frames is 1500 bytes, and if the data is very long, it must be split into frames for sending.


Network layer: this layer through addressing (addressing address) to establish a connection between two nodes, we all know that our computer connected to the network after an IP address, we can use the IP address to determine whether different computers in the same subnet. If our computer is connected to the network, there are two kinds of addresses: Physical Address and network address (IP address), the computer on the network to communicate, must know the communication computer "where", first of all through the network address to determine whether the same subnet, and then the physical address (MAC) address processing, To accurately determine the location of the computer to be communicated.

In the network layer is familiar with the IP protocol (that is, the Protocol for Network addresses), is currently widely used in the IP Protocol Fourth edition (IPV4), this version of the network address by 32-bit bits composition. We can configure our own IP address can also be automatically obtained by way of IP address, IP address divided into two parts, the first 24 representatives of the network, the latter 8 to represent the host number, such as 192.168.254.1 and 192.168.254.2 are in the same subnet because the first 24 bits of these two IP addresses are the same.

The network layer in the form of IP packets to pass data, IP packets also include two parts: Head and data, IP packets into the data frame in the data part of the transmission.


Transport layer: through Mac and IP address, we can find any two hosts on the Internet to establish communication. However, there is a problem, to find the host, the host has a lot of programs need to use the network, for example, you listen to music on the side and using QQ chat, when the network sent a packet, how to know it is to represent the content of the chat or song content, A parameter is needed to indicate which program (process) The packet is being sent to for use. This parameter is called the port number, and the host uses the port number to identify different programs (processes), the port is an integer between 0 and 65535, and 0 to 1023 of the ports are occupied by the system. The user can only select ports greater than 1023.

The function of the transport layer is to establish port to port communication, the network layer is to establish the host and the host communication, so that if we identified the host and port, so that the communication between the program can be implemented. We call the socket programming is through the code to achieve communication between the transport layer. Because the initialization of the socket class object is to specify the IP address and port number.

There are two very important protocols in the Transport layer: UDP Protocol and TCP protocol

UDP protocol is the transmission of UDP packets, the same UDP packets are also from the head and data two parts, the first part of the main identification of the sending port and accept port, the data is part of the specific content information. The same UDP packet is placed in the "Data" section of the IP packet, and the IP packet is then placed in the data frame and transmitted over the network.

Because the reliability of the UDP protocol is poor (the data can not be determined if the other is received), a high reliability protocol--TCP protocol is defined, and the TCP protocol takes a handshake to ensure that the data is received.


Application Layer: The application layer is the top-level of the model, is the interface between the user and the network, this layer through the application to complete the application requirements of network users. This layer of data is placed in the data portion of the TCP packet, which defines a very important protocol--http protocol, our general Web development is based on the application layer of development, so the following topics will be introduced with the HTTP protocol.

Now that we know that HTTP is a protocol for everyone at the application level, like what we're browsing the Web, the HTTP application is also based on the HTTP protocol, which is simpler, more efficient and more difficult.

Two: How HTTP protocol works

We all know the general communication process: First the client sends a request to the server, and the server generates a response (response) back to the client after receiving the request.



1. The format of request and response

Request Format:

HTTP request Line
(Request) Header
Blank Line
Optional message body

Note: The request line and caption must end with <CR><LF> (i.e., enter and then wrap). Only <CR><LF> and no other spaces must be in the blank line. In the http/1.1 protocol, all request headers, except host, are optional.

Instance:

get/http/1.1

Host:gpcuster.cnblogs.com

user-agent:mozilla/5.0 (Windows; U Windows NT 6.0; En-us; rv:1.9.0.10) gecko/2009042316 firefox/3.0.10

accept:text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8

accept-language:en-us,en;q=0.5

Accept-encoding:gzip,deflate

accept-charset:iso-8859-1,utf-8;q=0.7,*;q=0.7

keep-alive:300

Connection:keep-alive

If-modified-since:mon, May 2009 03:19:18 GMT

Response Format:

HTTP status Line
(answer) header
Blank Line
Optional message body

Instance:

http/1.1 OK

Cache-control:private, max-age=30

content-type:text/html; Charset=utf-8

Content-encoding:gzip

Expires:mon, May 2009 03:20:33 GMT

Last-modified:mon, May 2009 03:20:03 GMT

Vary:accept-encoding

server:microsoft-iis/7.0

x-aspnet-version:2.0.50727

X-powered-by:asp.net

Date:mon, May 2009 03:20:02 GMT

content-length:12173

Content of Message body (abbreviated)


2. Ways to establish a connection

The way the connection is established in HTTP Support 2: non-persistent and persistent connections (HTTP1.1 The default connection mode is persistent connection).

1) Non-persistent connection

Let's look at the steps for sending a Web page from the server to the customer in a non-persistent connection. Suppose the shell consists of 1 basic HTML files and 10 JPEG images, and all of these objects are stored on the same server host. Let's assume that the URL for the base HTML file is: gpcuster.cnblogs.com/index.html.

Here is the concrete step mule:

1.HTTP The client initializes a TCP connection to the HTTP server in the server host gpcuster.cnblogs.com. The HTTP server uses the default port number 80 to listen for connection creation requests from HTTP clients.

2.HTTP the customer sends an HTTP request message via the local socket associated with the TCP connection. This message contains the path name/somepath/index.html.

The 3.HTTP server receives the request message via the local socket associated with the TCP connection, and then extracts the object/somepath/index.html from the server host's memory or hard disk, sending a response message containing the object via the same socket.

The 4.HTTP server tells TCP to shut down the TCP connection (although TCP does not really terminate the connection until the customer receives the response message).

5.HTTP the customer receives the response message via the same socket. The TCP connection then terminates. This message indicates that the encapsulated object is an HTML file. The customer takes out the file and analyzes it and finds that there are references to 10 JPEG objects.

6. Repeat mule 1-4 for each reference to the JPEG object.

The preceding steps are referred to as using a non-persistent connection because each time the server issues an object, the corresponding TCP connection is closed, meaning that each connection does not last until it can be used to transfer other objects. Each TCP connection is used only to transfer one request message and one response message. For the above example, each time a user requests that Web page, 11 TCP connections are generated.


2) Persistent connection

Non-persistent connections have some drawbacks. First, the customer has to establish and maintain a new connection for each object to be requested. For each such connection, TCP has to allocate TCP buffers on both the client and server side and maintain the TCP variables. This can significantly increase the burden on Web servers that are likely to serve requests from hundreds of different customers at the same time. Second, as mentioned earlier, each object has a response extension of 2 RTT-one RTT is used to establish a TCP connection, and the other-an RTT is used to request and receive objects. Finally, each object suffers from a TCP slow start because each TCP connection starts at the slow start stage. However, the use of parallel TCP connections can partially mitigate the impact of RTT latency and slow start latency.

In the case of a persistent connection, the server keeps the TCP connection open after the response has been made. Subsequent requests and responses to the same client/server can be sent over this connection. The entire Web page (in the example above is a page that contains a basic HTMLL file and 10 images) never mind that it can be sent through a single persistent TCP connection: Even multiple Web pages that reside on the same server can be sent through a single persistent TCP connection. Typically, an HTTP server shuts down a connection after it has been idle for a certain amount of time, which is usually configurable. The persistent connection is divided into two versions with no pipeline (without pipelining) and a pipelined (with pipelining). In the case of a version without an assembly line, the customer will only issue a new request after receiving the response from the previous request. In this case, each object referenced by the Web page (10 images in the example above) undergoes a delay of 1 RTT, which is used to request and receive the object. Persistent connections without pipelining have improved compared to the delay of a non-persistent 2 RTT, but a durable connection with pipelining can further reduce the latency of the response. Another disadvantage of not having a pipelined version is that the server waits for the next request after an object has been sent, and the new request does not arrive immediately. During this time the server resources are idle.

RTT (round-trip time): round-trip latency. In the computer network, it is an important performance indicator, indicating that the data from the sender to start, to the sender to receive confirmation from the receiver (the receiver received the data immediately after receiving the confirmation), the total experience of the delay.

The http/1.1 default mode uses a persistent connection with pipelining. In this case, the HTTP client makes a request immediately after each encounter with a reference, so the HTTP client can send a request to each referenced object one after the other. After the server receives these requests, it can also emit each object one after another. If all of the requests and responses are sent next to each other, then all referenced objects will experience a total delay of 1 RTT (rather than a version without a pipeline, each of the referenced objects has a delay of 1 RTT). In addition, there are fewer time requests for servers to be empty in a persistent connection with pipelining. In addition to the Non-persistent connection, a persistent connection, whether or not with pipelining, reduces the response latency of 1 RTT, and the slow-start latency is also relatively small. The reason for this is that since each object uses the same TCP connection, the server emits the first object without having to send subsequent objects at a slow rate at the start. Instead, the server can start sending the next object at the rate at which the first object is sent.


3. Caching mechanism

The purpose of caching in http/1.1 is to reduce the sending request in many cases and, in many cases, to send a full response without requiring it. The former reduces the number of network loops, and HTTP uses an "expiration (expiration)" mechanism for this purpose. The latter reduces the bandwidth of network applications, and HTTP uses the "validation (validation)" mechanism for this purpose.

HTTP defines 3 kinds of caching mechanisms:

L Freshness allows a response to is used without re-checking it on the origin server, and can is controlled by Bo Th the server and the client. For example, the Expires response header gives a date when the document becomes stale, and the cache-control:max-age dire Ctive tells the cache how many seconds the response are fresh for.

L Validation can be used to check whether a cached response are still good after it becomes stale. For example, if the response has a last-modified header, a cache can make a conditional request using the If-modified-sinc E Header to = if it has changed.

L invalidation is usually a side effect of another request that passes the cache. For example, if URL associated with a cached response subsequently gets a POST, put or DELETE request, the cached response would be invalidated.


4. Responding to authorization-triggering mechanisms

These mechanisms can be used by the server to fire client requests and to authorize the client.

For more information please refer to: RFC 2617:http authentication:basic and Digest Access


5. HTTP-based Applications

Multi-threaded download Download tool to open multiple threads that emit HTTP requests each HTTP request requests only part of the resource file: content-range:bytes 20000-40000/47000 merging each thread-downloaded file HTTPS transport protocol principle Two basic types of encryption and decryption algorithms
Communication process:

Advantages: The client-generated key is only the client and server can obtain encrypted data only the client and server side can get clear text client to server side of the communication is secure servers and client interaction: Identity authentication cookie HTTP request several methods:
We often encounter this problem with the difference between get and post

Let's look at the difference between get and post

1. The data submitted by get is placed after the URL to split the URL and transmit the data, and the parameters are connected to &, such as editposts.aspx?name=test1&id=123456. The Post method is to place the submitted data in the body of the HTTP package.

2. The size of the data submitted by the get is limited (because the browser has a limit on the length of the URL), and the data submitted by the Post method is not limited.

3. Get mode needs to use Request.QueryString to get the value of the variable, and the Post method obtains the value of the variable by Request.Form.

4. Get way to submit data, there will be security issues, such as a login page, the get way to submit data, user name and password will appear on the URL, if the page can be cached or other people can access the machine, you can obtain the user's account and password from the historical record.






Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.