HTTP details (2)

Last Update:2018-06-02 Source: Internet

Author: User

Tags website performance

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. all people who have done Socket programming in the HTTP request type know that when we design a communication protocol, the "message header Message Body" segmentation method is very common, the message header tells the other party what the message is, and the message body tells the other party how to do it. The same is true for messages transmitted over HTTP. Each HTTP packet consists of an HTTP header and an HTTP body.

1. anyone who has done Socket programming in the HTTP Request Method knows that when we design a communication protocol, the "Message Header/Message Body" segmentation method is very common, the message header tells the other party what the message is, and the message body tells the other party how to do it. The same is true for messages transmitted over HTTP. Each HTTP packet consists of an HTTP header and an HTTP body.

1. HTTP request format

Anyone who has done Socket programming knows that when we design a communication protocol, the "Message Header/Message Body" split method is very common. The message header tells the other party what the message is, the message body tells the recipient how to do it. The same is true for messages transmitted over HTTP. Each HTTP packet is divided into two parts: HTTP header and HTTP body. The message body is optional and the message header is required. Every time we open a webpage, right-click on it and select "View Source File". The HTML code we see is the HTTP message body, the message header can be seen through the browser development tool or plug-in, if Firefox Firebug, IE Httpwatch.

The client sends an HTTP request to the server for resource access. It transmits a data block to the server, that is, request information. An HTTP request consists of three parts: request line, request header, and request body.

Request Line: Request Method URI protocol/version

Request Header)

Request body

The following is the data of an HTTP request:

POST /index.php HTTP/1.1Host: localhostUser-Agent: Mozilla/5.0 (Windows NT 5.1; rv:10.0.2) Gecko/20100101 Firefox/10.0.2Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8Accept-Language: zh-cn,zh;q=0.5Accept-Encoding: gzip, deflateConnection: keep-aliveReferer: http://localhost/Content-Length：25Content-Type：application/x-www-form-urlencoded username=aa&password=1234

1. Request Line: Request Method URI protocol/version

The first line of the request is "Method URL protocol/version" and ends with a line break. Request lines are separated by spaces. The format is as follows:

POST/index. php HTTP/1.1

In the above Code, "GET" indicates the request method, "// ndex. php" indicates the URI, and "HTTP/1.1 indicates the protocol and Protocol version.

According to HTTP standards, HTTP requests can use multiple request methods. For example, HTTP1.1 supports seven request methods: GET, POST, HEAD, OPTIONS, PUT, DELETE, and TARCE. In Internet applications, the most common methods are GET and POST.

The URL completely specifies the network resource to be accessed. Generally, you only need to give a relative directory relative to the root directory of the server. Therefore, it always starts with "/". Finally, the Protocol version declares the HTTP Version Used during communication.

Request Method

In HTTP, multiple Request methods can be used for HTTP requests. These methods indicate how to access the resources identified by Request-URI. The following table lists the request methods supported by HTTP1.1:

Request method in HTTP1.1:

Method	Function
GET	Request to obtain the resource identified by Request-URI
POST	The Request server receives the entity encapsulated in the Request and uses it as part of the resource identified by Request-URI in Request-Line.
HEAD	Request to obtain the Response Message Header of the resource identified by Request-URI
PUT	The Request server stores a resource and uses Request-URI as its identifier.
DELETE	The Request server deletes the resource identified by Request-URI.
TRACE	The request information sent back by the request server. It is mainly used for testing or diagnosis.
CONNECT	Retain future use
OPTIONS	Query server performance or resource-related options and requirements

The following describes three methods: GET, POST, and HEAD:

(1) GET

The GET method is used to obtain information about the resource identified by Request-URI. The common form is:

GET Request-uri http/1.1
The GET method is the default HTTP Request Method. For example, when we access a webpage by directly entering the URL in the address bar of a browser, the browser uses the GET method to obtain resources from the server.

We can use the GET method to submit form data. The form data submitted by the GET method is only encoded and sent to the server as part of the URL. Therefore, if you use the GET method to submit form data, there is a security risk. For example:
Http: // localhost/login. php? Username = aa & password = 1234

From the preceding URL request, you can easily recognize the content submitted by the form. (? In addition, because the data submitted by the GET method is a part of the URL request, the amount of data submitted cannot be too large. This is because the browser has a limit on the url length.

Various browsers also limit the url length. Below are the url length restrictions for several common browsers: (unit: character)

IE: 2803

Firefox: 1, 65536

Chrome: 1, 8182

Safari: 80000

Opera: 1, 190000

(2) POST

The POST method is an alternative to the GET method. It mainly submits form data to the Web server, especially a large volume of data. After the request header information is complete, two carriage return headers (actually a blank line) are the data submitted by the form. As mentioned above, post form data:

Username = aa & password = 1234

The POST method overcomes some shortcomings of the GET method. When the form data is submitted through the POST method, the data is not part of the URL request but is transmitted to the Web server as standard data, which overcomes the disadvantages of the information in the GET method being unable to be kept confidential and the data volume is too small. Therefore, for security considerations and respect for user privacy, the POST method is usually used for form submission.

From the programming perspective, if you submit data using the GET method, the data is stored in the QUERY_STRING environment variable, and the data submitted by the POST method can be obtained from the standard input stream.

The GET and POST methods have the following differences:

1. on the client side, the Get method submits data through the URL, and the data can be seen in the URL; POST method, the data is placed in the body of the HTTP package.

2. The size of data submitted in GET mode is limited (because the browser has a limit on the URL length), and POST does not.

3. Security Issues. As mentioned in (1), when Get is used, the parameter is displayed in the address bar, but Post is not. Therefore, if the data is Chinese and non-sensitive, use get. If the data you enter is not Chinese characters and contains sensitive data, use post as well.

4. The server value options are different. For example, php can use $ _ GET to GET the value of the variable, while the POST method uses $ _ POST to GET the value of the variable.

(3) HEAD

The HEAD method and GET method are almost the same. The difference between them is that the HEAD method only requests the message header, not the complete content. For the response part of a HEAD request, the information contained in the HTTP header is the same as the information obtained through the GET request. With this method, you do not need to transmit the entire resource content to obtain the information of the resource identified by Request-URI. This method is usually used to test the validity, accessibility, and recent updates of hyperlinks.

Note that in HTML documents, get and post can be written in uppercase and lowercase, but GET and POST can only be written in uppercase.

2. Request Header

Each header field consists of a domain name, a colon (:), and a domain value. The domain name is case-insensitive. You can add any number of space characters before the Domain value. The header field can be expanded to multiple rows. At least one space or Tab character is used at the beginning of each line.

The most common HTTP request headers are as follows:

Transport header field

Connection:

Purpose: Indicates whether a persistent connection is required.

If the server sees the value "Keep-Alive" or the request uses HTTP 1.1 (HTTP 1.1 performs a persistent connection by default), it can take advantage of the persistent connection, when a page contains multiple elements (such as an Applet or image), the download time is significantly reduced. To achieve this, the server needs to send a Content-Length header in the response. The simplest method is to write the Content into ByteArrayOutputStream first, then, calculate the size of the content before writing it;

For example, Connection: keep-alive when a webpage is opened, the TCP Connection between the client and the server for transmitting HTTP data will not be closed. If the client accesses the webpage on this server again, will continue to use this established connection

For example, Connection: close indicates that after a Request is completed, the TCP Connection used to transmit HTTP data between the client and the server is closed. When the client sends the Request again, a TCP Connection needs to be established again.

Host (this header field is required when a request is sent)

The Host request header field is used to specify the Internet Host and port number of the requested resource. It is usually extracted from the http url.

Eg: http: //; localhost/index.html
The request message sent by the Browser contains the Host Request Header domain, as follows:
Host: localhost

Use the default port 80. If port 8080 is specified, it is changed to: Host: localhost: 8080.

Client header domain

Accept:

Role: acceptable media types (MIME type) by the browser ),

For example, Accept: text/html indicates that the browser can Accept text/html as the type of server sending back, which is also known as html document. If the server cannot return text/html data, the server should return an error 406 (non acceptable ).

Wildcard * represents any type. For example, Accept: */* indicates that the browser can process all types of data. (generally, this is what the browser sends to the server)

Accept-Encoding:

Purpose: The browser declares the encoding method it receives. It usually specifies the compression method, whether compression is supported, and what compression method (gzip and deflate) is supported. (Note: This is not only character encoding );

For example, Accept-Encoding: gzip, deflate. The Server can return HTML pages encoded by gzip or deflate to browsers that support gzip/deflate. In many cases, this can reduce the download time by 5 to 10 times, and also save bandwidth.

Accept-Language:

Purpose: The browser declares the language it receives.

Differences between a language and a character set: Chinese is a language, and Chinese has multiple character sets, such as big5, gb2312, and gbk;

For example, Accept-Language: zh-cn. If this header field is not set in the request message, the server assumes that the client is acceptable to all languages.

User-Agent:

Purpose: Tell the HTTP server the name and version of the operating system and browser used by the client.

When we log on to the forum online, we will often see some welcome information, which lists the names and versions of your operating system, the names and versions of your browsers, this often makes many people feel amazing. In fact, the server application obtains the information from the User-Agent Request Header domain, which allows the client to tell the server its operating system, browser, and other attributes.

Example: User-Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; CBA ;. net clr 2.0.50727 ;. net clr 3.0.20.6.2152 ;. net clr 3.5.30729 ;. NET4.0C; InfoPath.2 ;. NET4.0E)

Accept-Charset:

Purpose: The browser declares the character set it receives. This is the various character sets and character encoding described earlier in this article, such as gb2312 and UTF-8 (we generally say Charset includes the corresponding character encoding scheme );

For example: Accept-Charset: iso-8859-1, gb2312. if this field is not set in the request message, it is acceptable by default for any character set.

Authorization: Authorization information, usually in the response to the WWW-Authenticate header sent by the server;

The Authorization request header domain is used to prove that the client has the right to view a resource. When a browser accesses a page, if the response code of the server is 401 (unauthorized), it can send a request containing the Authorization request header domain, requiring the server to verify the request.

Cookie/Login header domain

Cookie:

Role: The most important header that sends the cookie value to the HTTP server.

Entity header field

Content-Length

Purpose: The length of the data sent to the HTTP server. The length of the Request Message Body;

Example: Content-Length: 38

Content-Type:

Purpose:

Example: Content-Type: application/x-www-form-urlencoded

Miscellaneous header field

Referer:

Purpose: The server that provides the Request context information to tell the server which link I came from, for example, from my homepage to a friend, his server can calculate from HTTP Referer how many users click the link on my homepage to visit his website every day.

Example: Referer: http://translate.google.cn /? Hl = zh-cn & tab = wT

Cache header domain

If-Modified-Since:

Purpose: Send the last modification time of the browser cache page to the server. The server compares the modification time with the last modification time of the actual file on the server. If the time is the same, 304 is returned, and the client directly uses the local cache file. If the time is different, 200 and the new file content are returned. The client discards the old file, caches the new file, and displays it in the browser.

For example, If-Modified-Since: Thu, 09 Feb 2012 09:07:57 GMT.

If-None-Match:

Purpose: If-None-Match and ETag work together. The working principle is to add ETag information in HTTP Response. When the user requests the resource again, the If-None-Match information (ETag value) will be added to the HTTP Request ). If the ETag of the server authentication resource is not changed (the resource is not updated), a 304 status will be returned to tell the client to use the local cache file. Otherwise, the 200 status and new resources and Etag will be returned. Using this mechanism will improve the website performance.

For example, If-None-Match: "03f2b33c0bfc0: 0"

Pragma:

Purpose: prevent the page from being cached. In HTTP/1.1, it works exactly as well as Cache-Control: no-cache.

There is only one usage for Pargma, for example: Pragma: no-cache

Note: In HTTP/1.0, only Pragema: no-cache is implemented, and Cache-Control is not implemented.

Cache-Control:

Role: This is a very important rule. This is used to specify the cache mechanism followed by Response-Request. The meaning of each instruction is as follows:

Cache-Control: Public can be cached by any Cache ()

Cache-Control: Private content is only cached in the Private Cache.

Cache-Control: no-cache all content will not be cached

2. HTTP response format

After receiving and interpreting the request message, the server returns an HTTP Response Message. Similar to HTTP requests, HTTP responses are composed of three parts: Status line, message header, and response body. For example:

HTTP/1.1 200 OKDate: Sun, 17 Mar 2013 08:12:54 GMTServer: Apache/2.2.8 (Win32) PHP/5.2.5X-Powered-By: PHP/5.2.5Set-Cookie: PHPSESSID = c0huq7pdkmm5gg6osoe3mgjmm3; path =/Expires: Thu, 19 Nov 1981 08:52:00 GMTCache-Control: no-store, no-cache, must-revalidate, post-check = 0, pre-check = 0 Pragma: no-cacheContent-Length: 4393Keep-Alive: timeout = 5, max = 100 Connection: Keep-AliveContent-Type: text/html; charset = UTF-8
HTTP response example</Head>Hello HTTP!

1. Status line

The status line consists of the Protocol version, status code in the form of numbers, and the corresponding status description. Each element is separated by a space and a carriage return line break is entered at the end. The format is as follows:

HTTP-Version Status-Code Reason-Phrase CRLF

HTTP-Version indicates the HTTP protocol Version of the server, Status-Code indicates the response Code sent back by the server, Reason-Phrase indicates the text description of the Status Code, and CRLF indicates the carriage return and line feed. For example:

HTTP/1.1 200 OK (CRLF)

Status Code and description

The status code consists of three digits, indicating whether the request is understood or satisfied. The status description provides a brief text description of the status code. The first digit of the status code defines the response category, and the following two digits do not have a specific category. The first number has five values, as shown below.

1xx: indicates that the request has been accepted and continues Processing
2xx: Success-indicates that the request has been successfully received, understood, and accepted.
3xx: Redirection-further operations are required to complete the request
4xx: client error-the request has a syntax error or the request cannot be implemented
5xx: Server Error -- the server fails to fulfill the valid request.

Common status codes, status descriptions, and descriptions:
200 OK // client request successful
400 Bad Request // The client Request has a syntax error and cannot be understood by the server
401 Unauthorized // The request is Unauthorized. This status code must be used with the WWW-Authenticate header domain
403 Forbidden // The server receives the request but rejects the service.
404 Not Found // The requested resource does Not exist. For example, the incorrect URL is entered.
500 Internal Server Error // unexpected Server Error
503 Server Unavailable // The Server cannot process client requests currently and may return to normal after a period of time

2. response body

The response body is the content of the resource returned by the server. The response header and body must be separated by blank lines. For example:

HTTP response example</Li>
Hello HTTP!

3. Response Header Information

The most common HTTP Response Headers are as follows:

Cache header domain

Date:

Purpose: generate the message's specific time and date, that is, the current GMT time.

Example: Date: Sun, 17 Mar 2013 08:12:54 GMT

Expires:

Purpose: The browser uses the local cache within the specified expiration time to specify when the document is deemed to have expired and no longer cached.

Example: Expires: Thu, 19 Nov 1981 08:52:00 GMT

Vary

Purpose:

Example: Vary: Accept-Encoding

Cookie/Login header domain

P3P

Purpose: set the Cookie for cross-origin access. This can solve the problem of cross-origin access cookie for iframe.

Example: P3P: CP = CURa ADMa DEVa PSAo PSDo OUR BUS UNI PUR INT DEM STA PRE COM NAV OTC NOI DSP COR

Set-Cookie

Function: a very important header used to send a cookie to the client browser. Each cookie written generates a Set-Cookie.

For example: Set-Cookie: PHPSESSID = c0huq7pdkmm5gg6osoe3mgjmm3; path =/

Entity object header field:

Attributes of object content, including object information type, length, compression method, last modification time, and data validity.

ETag:

Purpose: use it with If-None-Match. (See the If-None-Match instance in this section)

Example: ETag: "03f2b33c0bfcc0: 0"

Last-Modified:

Purpose: indicates the last modification date and time of the resource. (See the If-Modified-Since instance in the example)

Example: Last-Modified: Wed, 21 Dec 2011 09:09:10 GMT

Content-Type:

Purpose: The WEB server informs the browser of the type and character set of the object to respond,

For example:

Content-Type: text/html; charset = UTF-8

Content-Type: text/html; charset = GB2312

Content-Type: image/jpeg

Content-Length:

Specifies the length of the Object Body, expressed in decimal digits stored in bytes. In the process of data downlink, the Content-Length Method needs to cache all data on the server in advance, and then all the data is sent to the client.

Example: Content-Length: 19847

Content-Encoding:

Purpose: Encode the document (Encode. It is generally the compression method.

The WEB server shows the compression method (gzip, deflate) used to compress the objects in the response. Gzip compression can significantly reduce the download time of HTML documents.

Example: Content-Encoding: gzip

Content-Language:

Purpose: The WEB server tells the browser the language of the response object.

For example, Content-Language: da

Miscellaneous header field

Server:

Purpose: Specify the software information of the HTTP server.

Example: Apache/2.2.8 (Win32) PHP/5.2.5

X-Powered-:

Purpose: indicates the technology used for website development.

Example: X-Powered-By: PHP/5.2.5

Transport header field

Connection:

For example, Connection: keep-alive when a webpage is opened, the TCP Connection between the client and the server for transmitting HTTP data will not be closed. If the client accesses the webpage on this server again, will continue to use this established connection

For example, Connection: close indicates that after a Request is completed, the TCP Connection used to transmit HTTP data between the client and the server is closed. When the client sends the Request again, a TCP Connection needs to be established again.

Location header field

Location:

Role: Used to redirect a new location, including a new URL address

For more information about instances, see 304 status instances.

The difference between the stateless HTTP protocol and Connection: keep-alive

Stateless means that the Protocol has no memory for transaction processing, and the server does not know the client status. On the other hand, there is no connection between opening a webpage on a server and the webpage on the server you opened before.

HTTP is a stateless connection-oriented protocol. Stateless does not mean that HTTP cannot maintain a TCP connection, nor does it mean that HTTP uses a UDP Protocol (No connection ).

Starting from HTTP/1.1, Keep-Alive is enabled by default to maintain the connection feature. To put it simply, after a webpage is opened, the TCP connection between the client and the server for transmitting HTTP data will not be closed. If the client accesses the webpage on the server again, it will continue to use this established connection.

Keep-Alive does not Keep the connection permanently. It has a retention time, which can be set in different server software (such as Apache.

3. browser cache

Browser cache: Includes page html cache, image js, css, and other resource cache. For example, the browser cache saves the page information to the user's local computer's hard disk.

1. Advantages of Caching:

1) Faster Server Response: Because requests are sent from the Cache Server (closer to the client) rather than from the source server, this process takes less time and makes the server seem to respond faster.

2) reduce network bandwidth consumption: when the copy is reused, the client bandwidth consumption is reduced; the customer can save bandwidth costs, control the increase in bandwidth demand and make it easier to manage.

1. cache Working Principle

The cache status of the page is determined by the http header. One is the browser request information and the other is the server response information. It mainly includes Pragma: no-cache, Cache-Control, Expires, Last-Modified, and If-Modified-Since. Pragma: no-cache is defined by HTTP/1.0, and Cache-Control is set by HTTP/1.1.

Working principle diagram:

We can see that the principle is divided into three steps:

First request: the browser sends an Expires, Cache-Control, and Last-Modified/Etag request to the server through the http header. At this time, the server records the Last-Modified/Etag of the first request.
Request again: When the browser requests the request again, the request header contains Expires, Cache-Control, If-Modified-Since/Etag to request the server.
The server compares the Last-Modified/Etag recorded for the first time with the If-Modified-Since/Etag requested for the second time to determine whether an update is required, the server uses these two headers to determine that the local resources have not changed. The client does not need to download the files again and returns a 304 response. Shows the common process:

Cache-related HTTP extended Message Headers

Expires: Set the page expiration time, Greenwich Mean Time GMT

Cache-Control: More detailed Control over Cache content

Last-Modified: The Last modification time of the request object is used to determine whether the cache expires. This is usually generated by the file time information.

ETag: The Resource Check value in the response. It is uniquely identified in a certain time period on the server. ETag is a token that can be associated with Web resources. It is not used much with Last-Modified and is also an identifier. It is generally used with Last-Modified to improve the accuracy of server judgment.

Date: server time

If-Modified-Since: The Last modification time of the resource accessed by the client, which is used to compare with the Last-Modified of the server.

If-None-Match: Check value of the resource accessed by the client, which is the same as ETag.

Cache-Control Parameters
Cache-Control: private/public responses are cached and shared among multiple users. Private responses can only be used as Private caches and cannot be shared between users.
Cache-Control: no-cache: no Cache
Cache-Control: max-age = x: the Cache time is in seconds.
Cache-Control: must-revalidate: If the page expires, it is obtained from the server.

2. cache of images, css, js, and flash

This technology is mainly implemented through server configuration. If the apache server is used, you can useMod_expiresModule:

Compile the mod_expires module:

Cd/root/httpd-2.2.3/modules/metadata

/Usr/local/apache/bin/apxs-I-a-c mod_expires.c // compile

Edit httpd. conf: Add the following content

ExpiresActive on

ExpiresDefault "access plus 1 month"

ExpiresByType text/html "access plus 1 months"

ExpiresByType text/css "access plus 1 months"

ExpiresByType image/gif "access plus 1 months"

ExpiresByType image/jpeg "access plus 1 months"

ExpiresByType image/jpg "access plus 1 months"

ExpiresByType image/png "access plus 1 months"

EXpiresByType application/x-shockwave-flash "access plus 1 months"

EXpiresByType application/x-javascript "access plus 1 months"

# ExpiresByType video/x-flv "access plus 1 months"

Explanation: the first sentence is to enable the Service.

The default time is one month.

The cache time settings for various types of resources are as follows:

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More