Detailed explanation of HTTP protocol Author: small tank Source: blog Park Release Date: Read: 59090 recommendations: 66 Original link [favorites]
Related Articles:HTTP Compression
The development technology of today's web programs is truly a battle, ASP. NET, PHP, JSP, Perl, Ajax, and so on. Regardless of the future development of web technology, it is very important to understand the basic protocols for communication between Web applications because they allow us to understand the internal work of Web applications. this article will provide detailed examples of the HTTP protocol, and I hope you will be patient. I also hope it will be helpful for everyone's development or testing work. You can use the fiddler tool to easily capture HTTP requests and HTTP response. For more information about how to use the fiddler tool, see my blog [fiddler tutorial].
Reading directory
- What is HTTP?
- Web server, browser, Proxy Server
- URL details
- The HTTP protocol is stateless.
- HTTP message structure
- Difference between get and post Methods
- Status Code
- HTTP Request Header
- HTTP Response Header
- The difference between the stateless HTTP protocol and connection: keep-alive
What is HTTP?
Protocol refers to the regulations or rules that must be followed by two computers in a computer communication network. Hypertext Transfer Protocol (HTTP) is a communication protocol, it allows the transfer of Hypertext Markup Language (HTML) documents from the Web server to the client's browser
Currently, HTTP/1.1 is used.
Web server, browser, Proxy Server
When we open the browser, enter the URL in the address bar, and then we can see the webpage. What is the principle?
In fact, after we enter the URL, our browser sends a request to the Web server. After receiving the request, the Web server processes the request, generates the corresponding response, and then sends it to the browser, the browser parses the HTML in response, so that we can see the webpage, as shown in the process.
Our request may be sent to the Web server only after it passes through the proxy server.
Shows the process.
The proxy server is the transfer station of network information. What functions does it provide?
1. Improve access speed. Most proxy servers have the cache function.
2. Break through the limitations, that is, fq.
3. Hide the identity.
URL details
The URL (Uniform Resource Locator) address is used to describe resources on a network. The basic format is as follows:
Schema: // host [: Port #]/path/.../[; URL-Params] [? Query-String] [# anchor]
Scheme specifies the protocol used at the lower layer (for example, HTTP, https, and FTP)
IP address or domain name of the Host HTTP Server
Port # The default port number of the HTTP server is 80. In this case, the port number can be omitted. If another port is used, you must specify, for example, http://www.cnblogs.com: 8080/
Path
URL-Params
Data sent from query-string to the HTTP server
Anchor-anchor
URL example
Http://www.mywebsite.com/sj/test;id=8079? Name = sviergn & X = true # stuff
Schema: HTTP
HOST: www.mywebsite.com
Path:/SJ/test
URL partams: Id = 8079
Query string: Name = sviergn & X = true
Anchor: stuff
The HTTP protocol is stateless.
The HTTP protocol is stateless. The request of the same client does not correspond to the previous request. For the HTTP server, it does not know that the two requests come from the same client. To solve this problem, the Web Program introduces the cookie mechanism to maintain the status.
HTTP message structure
First, let's look at the structure of the request message. The request message is divided into three parts: The first part is the request line, the second part is the HTTP header, and the third part is the empty line between body. header and body. The structure is as follows:
The method in the first line indicates the request method. For example, "Post", "get", path-to-Resoure indicates the requested resource, and HTTP/version-number indicates the HTTP protocol version.
When the "get" method is used, the body is empty.
For example, the request for opening the blog homepage is as follows:
Get http://www.cnblogs.com/https/1.1
HOST: www.cnblogs.com
We use Fiddler to capture a request logged on to the blog Park and analyze its structure. The Inspectors tab displays the complete request message in RAW mode, as shown in figure
Let's look at the structure of the response message, which is basically the same as the structure of the request message. It is also divided into three parts: The first part is request line, the second part is request header, and the third part is an empty line between body. header and body. The structure is as follows:
HTTP/version-number indicates the HTTP protocol version. For details about status-code and message, see [Status Code.
We use Fiddler to capture the response of a blog homepage and analyze its structure. Under the Inspectors tab, we can see the complete response message in RAW mode, as shown in figure
Difference between get and post Methods
The HTTP protocol defines many methods to interact with the server. There are four basic methods: Get, post, put, and delete. a URL address is used to describe resources on a network. The get, post, put, and delete operations in HTTP correspond to four operations to query, modify, add, and delete resources. The most common ones are get and post. Get is generally used to obtain/query resource information, while post is used to update resource information.
Let's look at the difference between get and post.
1. The data submitted by get will be placed after the URL? Splits the URL and transmits data. parameters are connected with each other, for example, editposts. aspx? Name = test1 & id = 123456. The post method places the submitted data in the body of the http package.
2. The size of the data submitted by get is limited (because the browser has a limit on the URL length), but there is no limit on the data submitted by the POST method.
3. You need to use request. querystring to obtain the value of the variable in get mode, while request. form is used to obtain the value of the variable in post mode.
4. if you submit data in get mode, security issues may occur. For example, when you submit data in get mode on a login page, the user name and password will appear on the URL, if the page can be cached or accessed by others, you can obtain the user's account and password from the history.
Status Code
The first line in a response message is called a status line, which consists of three parts: HTTP Protocol version number, status code, and status message.
The status code is used to tell the HTTP client whether the HTTP server has produced the expected response.
HTTP/1.1 defines five types of status codes, which are composed of three digits. the first digit defines the category of the response.
1xx prompt message-indicates that the request has been successfully received and continues Processing
2XX success-indicates that the request has been successfully received, understood, and accepted
3xx redirection-further processing is required to complete the request
4xx client error-request syntax error or request cannot be implemented
5xx server-side error-the server fails to implement valid requests
Look at some common status codes
200 OK
The most common is the successful response status code 200, which indicates that the request is successfully completed and the requested resource is sent back to the client.
For example, open the blog Home Page
302 found
Redirection, the new URL will be returned in the location in response, and the browser will use the new URL to send a new request.
For example, entering http://www.google.com in IE. the HTTP server will return 304, ie gets the new URL of Location header in response and resends a request.
304 not modified
Indicates that the previous document has been cached and can be used again,
For example, when I open the blog homepage, I find that many response status codes are 304.
Tip: if you do not want to use the local cache, press Ctrl + F5 to force the page to be refreshed.
400 bad request client request and syntax errors, which cannot be understood by the server
403 the forbidden server received the request but refused to provide the service.
404 not found
The requested resource does not exist (the URL is incorrect)
For example, enter a wrong URL in IE, http://www.cnblogs.com/tesdf.aspx
500 an unexpected error occurred on the internal server error Server
503 the server unavailable server cannot process client requests currently and may return to normal after a period of time
HTTP Request Header
Use Fiddler to conveniently view the reques header. Click inspectors tab> request tab> headers, as shown in.
There are many headers, which are hard to remember. We also classify headers as fiddler, which is clear and easy to remember.
Cache header domain
If-modified-since
Purpose: Send the last modification time of the browser cache page to the server. The server compares the modification time with the last modification time of the actual file on the server. If the time is the same, 304 is returned, and the client directly uses the local cache file. If the time is different, 200 and the new file content are returned. The client discards the old file, caches the new file, and displays it in the browser.
Example: If-modified-since: Thu, 09 Feb 2012 09:07:57 GMT
Example
If-None-match
Purpose: If-None-match and etag work together. The working principle is to add etag information in HTTP response. When the user requests the resource again, the IF-None-match information (etag value) will be added to the HTTP request ). If the etag of the server authentication resource is not changed (the resource is not updated), a 304 status will be returned to tell the client to use the local cache file. Otherwise, the 200 status and new resources and etag will be returned. Using this mechanism will improve the website performance.
For example, if-None-Match: "03f2b33c0bfc0: 0"
Example
Pragma
Purpose: prevent the page from being cached. In HTTP/1.1, it works exactly as well as cache-control: No-cache.
There is only one usage for pargma, for example: Pragma: No-Cache
Note: In HTTP/1.0, only pragema: No-cache is implemented, and cache-control is not implemented.
Cache-control
Role: This is a very important rule. This is used to specify the cache mechanism followed by response-request. The meaning of each instruction is as follows:
Cache-control: public can be cached by any cache ()
Cache-control: Private content is only cached in the private cache.
Cache-control: No-Cache all content will not be cached
There are other usage cases. I do not understand the meaning. Please refer to other documents.
Client header domain
Accept
Role: acceptable media types on the browser,
For example, accept: text/html indicates that the browser can accept text/html, which is also a common HTML document,
If the server cannot return text/HTML data, the server should return a 406 error (non acceptable)
Wildcard * represents any type
For example, accept: */* indicates that the browser can process all types of data. (generally, this is what the browser sends to the server)
Accept-encoding:
Purpose: The browser declares the encoding method it receives. It usually specifies the compression method, whether compression is supported, and what compression method (gzip and deflate) is supported. (Note: This is not only character encoding );
Example: Accept-encoding: gzip, deflate
Accept-Language
Purpose: The browser declares the language it receives.
Differences between a language and a character set: Chinese is a language, and Chinese has multiple character sets, such as big5, gb2312, and GBK;
Example: Accept-language: En-US
User-Agent
Purpose: Tell the HTTP server the name and version of the operating system and browser used by the client.
When we log on to the forum online, we will often see some welcome information, which lists the names and versions of your operating system, the names and versions of your browsers, this often makes many people feel amazing. In fact, the server application obtains the information from the User-Agent Request Header domain, which allows the client to tell the server its operating system, browser, and other attributes.
Example: User-Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; CBA ;. net CLR 2.0.50727 ;. net CLR 3.0.20.6.2152 ;. net CLR 3.5.30729 ;. net4.0c; infopath.2 ;. net4.0e)
Accept-charset
Purpose: The browser declares the character set it receives. This is the various character sets and character encoding described earlier in this article, such as gb2312 and UTF-8 (we generally say charset includes the corresponding character encoding scheme );
For example:
Cookie/login header domain
COOKIE:
Role: The most important header that sends the cookie value to the HTTP server.
Entity header field
Content-Length
Purpose: The length of the data sent to the HTTP server.
Example: Content-Length: 38
Content-Type
Purpose:
Example: Content-Type: Application/X-WWW-form-urlencoded
Miscellaneous header field
Referer:
Purpose: The server that provides the request context information to tell the server which link I came from, for example, from my homepage to a friend, his server can calculate from HTTP Referer how many users click the link on my homepage to visit his website every day.
Example: Referer: http://translate.google.cn /? Hl = ZH-CN & tab = WT
TRANSPORT header field
Connection
For example, connection: keep-alive when a webpage is opened, the TCP connection between the client and the server for transmitting HTTP data will not be closed. If the client accesses the webpage on this server again, will continue to use this established connection
For example, connection: Close indicates that after a request is completed, the TCP connection used to transmit HTTP data between the client and the server is closed. When the client sends the request again, a TCP connection needs to be established again.
Host (this header field is required when a request is sent)
Purpose: The request header field is used to specify the Internet host and port number of the requested resource. It is usually extracted from the HTTP URL
For example, we enter: http://www.guet.edu.cn/index.html in the browser
The request message sent by the Browser contains the host Request Header domain, as follows:
HOST: http://www.guet.edu.cn
The default port number 80 is used here. If the port number is specified, it is changed to: Host: Specifies the port number.
HTTP Response Header
Use Fiddler to view the response header and click inspectors tab> Response Tab> headers, as shown in
We also classify headers as fiddler, which is clear and easy to remember.
Cache header domain
Date
Purpose: specify the time and date when the message is generated.
Example: Date: sat, 11 Feb 2012 11:35:14 GMT
Expires
Purpose: The browser uses the local cache within the specified expiration time.
Example: expires: Tue, 08 Feb 2022 11:35:14 GMT
Vary
Purpose:
Example: vary: Accept-Encoding
Cookie/login header domain
P3p
Purpose: set the cookie for cross-origin access. This can solve the problem of cross-origin access cookie for IFRAME.
Example: p3p: Cp = Cura ADMA Deva psao psdo our bus uni pur int DEM sta pre com nav OTC Noi DSP Cor
Set-Cookie
Function: a very important header used to send a cookie to the client browser. Each cookie written generates a set-Cookie.
Example: set-COOKIE: SC = 4c31523a; Path =/; domain = .acookie.taobao.com
Entity header field
Etag
Purpose: use it with if-None-match. (See the IF-None-match instance in this section)
Example: etag: "03f2b33c0bfcc0: 0"
Last-modified:
Purpose: indicates the last modification date and time of the resource. (See the IF-modified-since instance in the example)
Example: Last-modified: Wed, 21 Dec 2011 09:09:10 GMT
Content-Type
Purpose: The Web server informs the browser of the type and character set of the object to respond,
For example:
Content-Type: text/html; charset = UTF-8
Content-Type: text/html; charset = gb2312
Content-Type: image/JPEG
Content-Length
Specifies the length of the Object Body, expressed in decimal digits stored in bytes. In the process of data downlink, the Content-Length Method needs to cache all data on the server in advance, and then all the data is sent to the client.
Example: Content-Length: 19847
Content-Encoding
The Web server shows the compression method (gzip, deflate) used to compress the objects in the response.
Example: Content-encoding: Gzip
Content-language
Purpose: The Web server tells the browser the language of the response object.
For example, content-language: da
Miscellaneous header field
Server:
Purpose: Specify the software information of the HTTP server.
For example, server: Microsoft-IIS/7.5
X-ASPnet-version:
Purpose: If the website is developed using ASP. NET, this header is used to indicate the version of ASP. NET.
Example: X-ASPnet-version: 4.0.30319
X-powered-:
Purpose: indicates the technology used for website development.
Example: X-powered-by: ASP. NET
TRANSPORT header field
Connection
For example, connection: keep-alive when a webpage is opened, the TCP connection between the client and the server for transmitting HTTP data will not be closed. If the client accesses the webpage on this server again, will continue to use this established connection
For example, connection: Close indicates that after a request is completed, the TCP connection used to transmit HTTP data between the client and the server is closed. When the client sends the request again, a TCP connection needs to be established again.
Location header field
Location
Role: Used to redirect a new location, including a new URL address
For more information about instances, see 304 status instances.
The difference between the stateless HTTP protocol and connection: keep-alive
Stateless means that the Protocol has no memory for transaction processing, and the server does not know the client status. On the other hand, there is no connection between opening a webpage on a server and the webpage on the server you opened before.
HTTP is a stateless connection-oriented protocol. Stateless does not mean that HTTP cannot maintain a TCP connection, nor does it mean that HTTP uses a UDP Protocol (No connection ).
Starting from HTTP/1.1, keep-alive is enabled by default to maintain the connection feature. To put it simply, after a webpage is opened, the TCP connection between the client and the server for transmitting HTTP data will not be closed. If the client accesses the webpage on the server again, it will continue to use this established connection.
Keep-alive does not keep the connection permanently. It has a retention time, which can be set in different server software (such as APACHE.