HTTP protocol Analysis

Last Update:2018-08-12 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

HTTP protocol Introduction

Http:hyper Text Transfer Protocol Hypertext Transfer Protocol, is the most widely used Internet protocol, mainly for WEB services. The text information is processed by the computer and is formatted as HTML (Hyper text mark Language) Hypertext Markup Language.

Version of the HTTP protocol

HTTP 0.9: The user transmits HTML documents only

http1.0

1. The introduction of MIME (Multipurpose Internet Mail extesions) mechanism: Multi-purpose Internet Mail extension, after introducing this technology, HTTP can send multimedia (such as video, audio, etc.) information. This mechanism lets HTTP not only support HTML format alone, but also support other formats for sending.

2. The keep-alive mechanism is introduced to support the function of long-lasting (but this keep-alive principle is formed by adding a field to the header, which does not natively support this feature)

3. Introduction of support Cache function

HTTP 1.1

Supports more request methods, finer cache control, and native support for persistent (long) connectivity (presistent) directly.

HTTP 2.0

Provides an HTTP semantic optimized transport, Spdy:google introduces a technology that accelerates HTTP data interaction, especially with SSL acceleration, but Spdy is not using much now. The version that is commonly used today is HTTP version 1.0 and HTTP version 1.1.

How HTML documents are generated

Static

Edit and define the completed

Dynamic

Output HTML format results after a program written in a compiled language

Dynamic languages are: php,jsp,asp,.net

Note: These scripts must have a corresponding interpreter, such as PHP need to have PHP interpreter and so on static and dynamic way

Static

1. The WEB server registers the socket with the kernel

2, the client through the browser, to the WEB server to initiate request requests

3. The WEB server receives request information from the client

4. If the resource requested by the user is local to the server, the HTTP service will request a call to the system kernel

5. The kernel calls the data on the local disk and sends the data to the HTTP service

6, HTTP will be the resources requested by the user through the response message, the final response to the client

Dynamic

Unlike static, if the user is requesting dynamic content, the HTTP service then invokes the back-end parser.

The user's request is processed by a dynamic language, and if a request for data is requested, a call is made to the kernel to obtain the user-specified data from the disk, which is run by the interpreter, and the results of the run are usually generated in an HTML-formatted file. It is then constructed into a response message, which is eventually sent back to the client.

HTTP protocol

Messages for HTTP protocol

There are many lines in the HTTP message, which is usually made up of ASCII strings, and the length of each field is indeterminate. HTTP messages can be divided into two types: Request message and Response message.

1.request message (Request message)

Client-→ server-side

Requests are made to the server side by the client, and different sites are used to request different resources (HTML documents)

2.response message (Response message)

Server-side-→ client

Is the server responds to client requests

Request Message Format Introduction

Request Line + Request Header + blank line + Request entity

1. Request Line

By the Request Method field <method>+ Request URL Field <requests-url>+http Protocol version <version> The request method that is used to identify the resource requested by the client, the requested resource, What version of the Protocol are requested, and they are separated directly using "spaces"!

Example: http/1.0,http/1.1

is to use the Wireshark tool to crawl the display of HTTP request messages. The "\ r \ n" After the header represents a carriage return and a newline, which separates the header from the next header. or get the HTTP request message with the Curl command

2. Request Header

Consists of the keyword + keyword value, between the use of ":" to separate, format name:value, the role of the request header is through the client will request the relevant content to the server side, the header can be more than one.

First, there may be more than one header. The various header information that can be used

3. Blank Line

After the request header there will be a blank line, by sending a carriage return character and line break, to notify the server side of the content will no longer appear in the request header information.

4. Request entity

What exactly is the content you need to request?

Request entity, what exactly are you asking for?

Description of Response message format

Start line + response header + blank line + response entity

1. Start line

Also known as the status line, used for server-side response to client request status information, consisting of version number <version> + status code <status> + cause phrase <reason-phrase>, such as "http/1.1 OK"

2. Response header

Similar to the request message, there are usually several header fields behind the starting line. Each header field consists of a name and a value, separated by a colon. Format Name:value.

3. Blank Line

The last response to the header is followed by a blank line that notifies the client that there is no header information on the empty line by sending a carriage return and a newline character

4. Response Entity

The response entity loads the data to be returned to the client. The data can be either text or binary (examples, video)

HTTP Request method

In the HTTP communication process, each HTTP request message contains an HTTP request method that tells the client to perform some specific operations to the server-side request, following a few common HTTP request methods.

Status code for HTTP

Common Status Code description

HTTP Header Introduction

? General Header

? Request Header

? Response header

? Entity header: Specifically used to represent the type, length, encoding format, etc. within a resource in an entity

? Extension header: Non-standard header, which can be created by programmers themselves

General Header

? Connection: Defines the relevant options for requests and responses between C/s

In http1.0, if he wants to use a persistent connection, then the option he sets is connection:keep-alive

? Cache-control: Cache control for finer-grained cache control. More common on HTTP 1.1

Request Header

? CLIENT-IP: Client IP Address

? Host: Requesting hosts, which is useful when implementing host-name-based virtual hosts

? Referer: Indicates the URL of the original resource that requested the current resource, using Referer can be a chain of anti-theft

? User-agent: User agent, generally a browser

? Accept header: Refers to the type of encoding that the client can accept

? Accept: The type of media that the server can send

? Accetp-charset: The received character set

? Accept-encoding: Encoding Format

? Accept-lanage: Acceptable language encoding format

? Conditional Request Header: (only used in http1.1) when sending a request, ask whether the other party satisfies the condition, if the request is satisfied, does not satisfy the request

? Security-related requests:

? Authorization? Cookies

Response header

? Age: The amount of time a resource responds to you after you can use it

? Server: Describe the program name and version you use to the client

? The header of the negotiation class:

? Vary: The first list, the server will pick the most appropriate version based on this list to send to the client

? Security-Related:

? Www-authentication

? Set-cookie

Entity Header

? Location: Indicates the new position of the resource, which is typically used when implementing a 302 response code

? Allow: A request method that allows the use of this resource

? Content-Related headers

? Content-encoding

? Content-language

? Content-length

? Content-location: where content is located

? Content-type

? Cache correlation:

? ETAG: Extended Tag/tag

? Expires: Expiry time

? Last-modified: Last Modified time

ETag Explanation:

On the network, there are some cache servers, in addition, the browser itself also has a cache function based on a premise: the picture will not be changed frequently, the server returns the status code 200, but also return the image of the signature Etag, (can be understood as the image of the fingerprint), when the browser again access to the picture, Will go to the server to verify the fingerprint information, if the picture has not changed, directly using the picture in the cache, so as to alleviate the burden of the server, a see 304, the browser will know, to take pictures from the local cache, save the picture on the network transmission time

Attached: HTTP most common request headers are as follows:

Accept: The MIME type acceptable to the browser;

Accept-charset: The acceptable character set of the browser;

Accept-encoding: The way the browser can decode data encoding, such as gzip.

Accept-language: The type of language the browser wishes

Authorization: Authorization information, which usually appears in the response to the Www-authenticate header sent to the server;

Connection: Indicates whether a persistent connection is required. A value of "keep-alive", or if you see the request using an HTTP 1.1 (HTTP 1.1 is persistent by default), it can take advantage of the persistent connection, which significantly reduces the time required to download when the page contains multiple elements, such as applets, pictures.

Content-length: Indicates the length of the request message body;

Cookie: This is one of the most important request header information;

Cookie-related HTTP extension headers

1) Cookie: The client returns the cookie set by the server to the server;

2) Set-cookie: The server sets a Cookie to the client;

3) Cookie2 (RFC2965)): The client instructs the server to support the version of the Cookie;

4) Set-cookie2 (RFC2965): The server sets a Cookie to the client.

The cookie's process server uses the Set-cookie header to return the contents of the cookie to the client in the response message, and the client sends the same content to the server in the new request with the same contents in the cookie header. This allows for the session to persist. The process is as follows:

Host: The hosts and ports in the initial URL;

If-modified-since: Returns a 304 "not Modified" answer only if the requested content has been modified after the specified date;

Referer: Contains a URL from which the user accesses the currently requested page from the page represented by the URL.

User-agent: The most common response header HTTP for browser type HTTP is the most common response header, as follows:

Allow: Which request methods are supported by the server (such as GET, POST, etc.);

Content-encoding: The Encoding (Encode) method of the document.

Content-length: Indicates the content length. This data is only required if the browser is using a persistent HTTP connection.

Content-type: Indicates what MIME type the following document belongs to.

Accept-ranges:bytes The response header indicates that the server supports a Range request and that the server supports bytes (which is also the only available unit). We also know: The server supports the continuation of breakpoints and supports simultaneous downloading of multiple parts of a file. This means that the download tool can use the range request to speed download the file.

The Accept-ranges:none response header indicates that the server does not support range requests.

Date: The current GMT time.

Expires: Indicates when the document should be assumed to have expired and thus no longer caches it.

Last-modified: The last time the document was changed.

Location: Indicates where the customer should go to extract the document.

Refresh: Indicates how much time the browser should refresh the document, in seconds.

HTTP most common Entity header entity header is used as meta information for entity content, which describes the attributes of entity content, including entity information type, length, compression method, last modification time, data validity, etc.

Allow:get,post

Content-encoding: Document Encoding (Encode) method, for example: gzip,

Content-language: The language type of the content, for example: ZH-CN;

Content-length: Indicates the content length, eg:80, can refer to "2.5 response head";

Content-location: Indicates where the customer should go to extract the document, for example: http://www.dfdf.org/dfdf.html,

A MD5 Digest of the CONTENT-MD5:MD5 entity used as a checksum. Both the sender and the receiver calculate the MD5 summary, and the recipient compares the value that it calculates with the value passed in this header.

Content-type: Indicates the MIME type of the entity being sent or received. eg:text/html; charset=gb2312 Main Type/sub-type;

Transactions for HTTP

Contains an HTTP request, and the response to the corresponding request is called an HTTP transaction, or an HTTP transaction is understood as a complete HTTP request and HTTP response process. HTTP protocol By default, each transaction will open and close a new connection, so it will be quite time-consuming and bandwidth, because of the TCP slow-start feature, so that the performance of each new connection itself will be reduced, so the number of parallel connections can be opened limit is limited. So using a persistent connection is better than not using persistent connections by default, and his benefits are reduced by the time it takes to request and TCP disconnect.

HTTP Resources

A resource is an HTTP protocol that allows a user to request and obtain content through the HTTP protocol to the server through a browser or user agent, such as an HTML document, a picture, and so on.

Resource type: is marked with MIME

Format: Major/minor primary and secondary marks

Commonly used MIME types

Uri and URL (URL is a subset of URIs )

? The URI (Uniform Resource Identifier) The same resource identifier is used to identify a string of an Internet resource name, which allows your users to interact with a resource through a specific protocol. Each resource that is available on the Web, including HTML documents, images, video clips, programs, and so on, is positioned by a common resource identifier. So we can use URIs to identify the names of each resource.

? A URL (Uniform Resource Locator) (Uniform Resource Locator) is used to describe a specific location of a resource on a particular server.

For example, the format of the http://www.baidu.com:80/download/bash-4.3.1-1.rpm URL is divided into three sections

I. Scheme (also called Protocol):/HTTP

II. Internet address: Typically this address refers to the server: www.baidu.com:8080

III. Resources on a specific server: download/bash-4.3.1-1.rpm

Cgi

Web server discovery needs to execute a script, the CGI protocol with the backend of the application to deal with the user's request dynamically to the server, the results of this server is returned to the HTTP server through the CGI protocol.

Other knowledge you need to know

A specific process for a WEB resource request

1. The client enters the address to be accessed in the Web browser

2. The Web browser requests the DNS server to resolve the address to the specified domain name and Web server

3. The client establishes a connection to the requested WEB server (TCP three-time handshake)

4. After the TCP establishment succeeds, the HTTP request is initiated

5. After a client HTTP request is received by the server, the request is processed

6. Handling the resources specified by the client for the request

7. The server constructs the response message, responds to the client

8. Server-side logs this information to the log

HTTP protocol Analysis

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

HTTP protocol Analysis

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

HTTP protocol Analysis

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support