HTTP Proxy HTTP protocol detailed2014-01-03 23:36 Source: Proxy IP Resource Network click: 1675 times
Today's Web program development technology is really a contention, ASP, PHP, Jsp,perl, AJAX and so on. Regardless of how web technologies evolve in the future, it is important to understand the basic protocols for communicating between web programs, because it allows us to understand the internal work of Web applications. This article will be a detailed example of the HTTP protocol to explain, more content, I hope you have patience to see. We also hope that we can help you with your development work or test work. It is very easy to capture HTTP request and HTTP Response using the Fiddler tool.
Read Catalogue
- What is the HTTP protocol
- Web server, browser, proxy server
- URL detailed
- The HTTP protocol is stateless
- Structure of the HTTP message
- The difference between the Get and post methods
- Status code
- HTTP Request Header
- HTTP Response Header
- The HTTP protocol is a stateless and connection:keep-alive difference
What is the HTTP protocol
Protocol refers to the rules or rules that must be adhered to in communication between two computers in a computer communication network, and Hypertext Transfer Protocol (HTTP) is a communication protocol that allows Hypertext Markup Language (HTML) documents to be routed from a Web server to a client's browser
We are currently using the http/1.1 version
Web server, browser, proxy server
When we open the browser, enter the URL in the Address bar, and then we see the page. What is the principle?
In fact, after we enter the URL, our browser sends a request to the Web server, the Web server receives the request, processes it, generates the corresponding response, sends it to the browser, and the browser parses the HTML in the response. So we see the Web page, as shown in the process
It is possible that our request was passed through a proxy server and finally arrived at the Web server.
The process is as shown
Proxy Server is a transit point of network information, what is the function?
1. Improve access speed, most of the proxy server has the cache function.
2. Break the limit, that's FQ.
3. Hide identities.
URL detailed
The URL (Uniform Resource Locator) address is used to describe a resource on a network with the following basic format
schema://host[:p ort#]/path/.../[;url-params][?query-string][#anchor]
Scheme specifies the protocol used by the lower layer (for example: HTTP, HTTPS, FTP)
The IP address or domain name of the host HTTP server
The default port for the port# HTTP server is 80, in which case the lower number can be omitted. If you use a different port, you must specify, for example, http://www.cnblogs.com:8080/
Path to access resource
Url-params
Query-string data sent to the HTTP server
anchor-Anchor
An example of a URL
Http://www.mywebsite.com/sj/test;id=8079?name=sviergn&x=true#stuff
Schema:http
Host:www.mywebsite.com
Path:/sj/test
URL params:id=8079
Query String:name=sviergn&x=true
Anchor:stuff
The HTTP protocol is stateless
HTTP protocol is stateless, the same client's request and the last request is not the corresponding relationship, for the HTTP server, it does not know that the two requests from the same client. To solve this problem, the Web program introduces a cookie mechanism to maintain state.
Structure of the HTTP message
First look at the structure of the request message, the request message is divided into 3 parts, the first part is called the request line, the second part is called the HTTP header, the third part is the body. There is a blank line between the header and the body, as the structure
The method in the first line represents the request methods, such as "POST", "GET", Path-to-resoure represents the requested resource, and Http/version-number represents the version number of the Http protocol
When the "GET" method is used, the body is empty
For example, we open the Blog Garden home page request as follows
GET http://www.cnblogs.com/HTTP/1.1
Host:www.cnblogs.com
We use Fiddler to capture a blog site login request and then analyze its structure, in the Inspectors tab under the raw way to see the complete request message, such as
Let's look at the structure of the response message, which is basically the same as the structure of the request message. Also divided into three parts, the first part is called Request line, the second part is called the request header, the third part is the body. There is also a blank line between the header and the body, as the structure
Http/version-number represents the version number of the HTTP protocol, Status-code and message, see the detailed explanation of the next section [Status code].
We use Fiddler to capture a blog home response then analyze its structure, in the Inspectors tab under the raw way can see the full response message, such as
The difference between the Get and post methods
The HTTP protocol defines a number of ways to interact with the server, the most basic of which are 4, get,post,put,delete, respectively. A URL address is used to describe a resource on a network, and the Get, POST, PUT, delete in HTTP corresponds to the search for this resource, change, increase, delete 4 operations. Our most common is get and post. Get is typically used to get/query resource information, and post is typically used to update resource information.
Let's look at the difference between get and post
1. Get submitted data will be placed after the URL, to split the URL and transfer data, the parameters are connected with &, such as editposts.aspx?name=test1&id=123456. The Post method is to put the submitted data in the body of the HTTP packet.
2. The data size of the Get commit is limited (because the browser has a limit on the length of the URL), and there is no limit to the data submitted by the Post method.
3. The Get method needs to use Request.QueryString to get the value of the variable, while the Post method obtains the value of the variable by Request.Form.
4. The Get method submits the data, which brings security issues, such as a login page, when the data is submitted by get, the user name and password will appear on the URL, if the page can be cached or other people can access the machine, you can obtain the user's account and password from the history.
Status code
The first line in the Response message is called the status line, which consists of the HTTP protocol version number, the status code, and the status message.
The status code is used to tell the HTTP client whether the HTTP server produced the expected response.
The 5 class status codes are defined in the http/1.1, and the status codes are made up of three digits, and the first number defines the category of the response
1XX hint Message-Indicates that the request was successfully received and continues processing
2XX Success-Indicates that the request has been successfully received, understood, accepted
3XX Redirect-further processing is required to complete the request
4XX Client Error-Request syntax error or request not implemented
5XX server-side error-the server failed to implement a legitimate request
Take a look at some common status codes
OK
The most common is the successful response status Code 200, which indicates that the request was successfully completed and the requested resource was sent back to the client
For example, open the blog Garden Home
302 Found
Redirect, the new URL will be returned in the location in response, and the browser will send a new request using the new URL.
For example, in IE enter http://w Ww.goog le.co m. The HTTP server returns 304, IE takes the new URL to the location header in response and sends a request again.
304 Not Modified
On behalf of the last document has been cached, you can continue to use,
For example, open the blog home page, found a lot of Response status code is 304
Tip: If you don't want to use a local cache, you can force the page to refresh with Ctrl+f5
Error request client requests and syntax errors cannot be understood by the server
403 Forbidden server receives request, but refuses to provide service
404 Not Found
The request resource does not exist (the wrong URL was lost)
For example, enter an incorrect URL in IE, http://www.cnblogs.com/tesdf.aspx
An unexpected error occurred on the Internal server error server
503 Server Unavailable Server is currently unable to process client requests and may return to normal after some time
HTTP Request Header
With Fiddler you can easily see the reques header, click Inspectors tab, Request tab, and headers as shown.
Header There are many, more difficult to remember, we also follow the Fiddler as the header classification, so clear and easy to remember.
Cache header Field
If-modified-since
Effect: The last modification time of the browser-side cache page is sent to the server, and the server compares this time with the last modification time of the actual file on the server. If the time is the same, then return 304, the client uses the local cache file directly. If the time is inconsistent, 200 and the new file contents are returned. After the client receives it, it discards the old files, caches the new files, and displays them in the browser.
For example: If-modified-since:thu, 09:07:57 GMT
Real example
If-none-match
Role: If-none-match works with the ETag and works by adding etag information to the HTTP response. When the user requests the resource again, the If-none-match information (the value of the ETag) is added to the HTTP request. If the server verifies that the etag of the resource has not changed (the resource is not updated), it returns a 304 status that tells the client to use the local cache file. Otherwise, the 200 state and the new resource and ETag are returned. Using such a mechanism will improve the performance of your website
Example: If-none-match: "03f2b33c0bfcc1:0"
Real example
Pragma
Role: Prevent the page from being cached, in the http/1.1 version, it is identical to the Cache-control:no-cache function
Pargma has only one usage, for example: Pragma:no-cache
Note: In the http/1.0 version, only Pragema:no-cache is implemented, not implemented Cache-control
Cache-control
Role: This is a very important rule. This is used to specify the caching mechanism that response-request follows. Each instruction has the following meanings
Cache-control:public can be cached by any cache ()
Cache-control:private content is cached only in the private cache
Cache-control:no-cache All content is not cached
There are other uses, I do not understand the meaning, please refer to other information
Client Header Domain
Accept
Role: The type of media that can be accepted by the browser side,
For example: accept:text/html represents the type of server postback that the browser can accept as text/html, which is what we often call HTML documents,
If the server cannot return data of type text/html, the server should return a 406 error (non acceptable)
Wildcard * represents any type
For example, Accept: */* on behalf of the browser can handle all types, (the general browser to the server is the issue of this)
Accept-encoding:
Function: The browser declares itself to receive the encoding method, usually specifies the compression method, whether compression is supported, what compression method is supported (Gzip,deflate), (note: This is not a character encoding);
Example: Accept-encoding:gzip, deflate
Accept-language
Role: The browser affirms the language it receives.
Language and Character set differences: Chinese is a language, Chinese has a variety of character sets, such as BIG5,GB2312,GBK and so on;
Example: accept-language:en-us
User-agent
Role: tells the HTTP server which client uses the name and version of the operating system and browser.
When we go online to the forum, often see some welcome information, which lists the name and version of your operating system, the name and version of the browser you are using, which often makes a lot of people feel very magical, in fact, The server application obtains this information from the User-agent request header domain user-agent The request header domain allows the client to tell the server about its operating system, browser, and other properties.
For example: user-agent:mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; trident/4.0; CIBA;. NET CLR 2.0.50727;. NET CLR 3.0.4506.2152;. NET CLR 3.5.30729;. net4.0c; infopath.2;. NET4.0E)
Accept-charset
Role: The browser affirms its own received character set, this is the various character sets and character encodings described earlier in this article, such as gb2312,utf-8 (usually we say CharSet includes the corresponding character encoding scheme);
For example:
Cookie/login header Field
Cookies:
Role: The most important header, the value of the cookie is sent to the HTTP server
Entity header Field
Content-length
Role: The length of the data sent to the HTTP server.
Example: content-length:38
Content-type
Role:
Example: content-type:application/x-www-form-urlencoded
Miscellaneous header Field
Referer:
Role: The server that provides the context information for the request tells the server which link I have received from, such as linking to a friend from my home page, and his server is able to count the number of users who clicked the link on my page every day from the HTTP referer to visit his website.
Example: REFERER:HTTP://TRANSLATE.GOOGLE.CN/?HL=ZH-CN&TAB=WT
Transport header Field
Connection
Example: connection:keep-alive when a Web page opens, the TCP connection between the client and the server for transmitting HTTP data does not close, and if the client accesses the Web page on the server again, it will continue to use the established connection
For example: Connection:close represents the completion of a request, the TCP connection between the client and the server for transmitting HTTP data is turned off, and the TCP connection needs to be re-established when the client sends the request again.
Host (the header field is required when the request is sent)
Role: The request header domain is used primarily to specify the Internet host and port number of the requested resource, which is typically extracted from the HTTP URL
For example: We entered in the browser: http://www.guet.edu.cn/index.html
In the request message sent by the browser, the host Request header field is included, as follows:
host:http://www.guet.edu.cn
The default port number 80 is used here, and if a port number is specified, it becomes: Host: Specify port number
HTTP Response Header
Also use Fiddler to view Response header, click Inspectors tab->response tab-> headers as shown
We also classify the header according to Fiddler, so that it is clearer and easier to remember.
Cache header Field
Date
Role: The exact time and date of the message generation
Example: Date:sat, 11:35:14 GMT
Expires
Role: The browser will use the local cache for the specified expiration period
For example: Expires:tue, 2022 11:35:14 GMT
Vary
Role:
Example: vary:accept-encoding
Cookie/login header Field
P3p
Role: Used to set cookies across domains, which resolves the issue of cross-domain access to cookies for IFRAME
Example: P3p:cp=cura ADMa DEVa Psao psdo our BUS UNI PUR INT DEM STA PRE COM NAV OTC NOI DSP COR
Set-cookie
Role: A very important header, used to send cookies to the client browser, each write cookie generates a Set-cookie.
For example: set-cookie:sc=4c31523a; path=/; Domain=.acookie.taobao.com
Entity header Field
ETag
Function: Used in conjunction with If-none-match. (See examples of If-none-match in the section)
For example: ETag: "03f2b33c0bfcc1:0"
Last-modified:
Role: Used to indicate the last modification date and time of the resource. (See examples of if-modified-since in the section)
Example: last-modified:wed, Dec 09:09:10 GMT
Content-type
Role: The Web server tells the browser the type and character set of the object it responds to.
For example:
content-type:text/html; Charset=utf-8
content-type:text/html;charset=gb2312
Content-type:image/jpeg
Content-length
Indicates the length of the entity body, expressed as a decimal number stored in bytes. In the process of data downlink, content-length the way to pre-cache all the data in the server, and then all the data peremptorily to the client.
Example: content-length:19847
Content-encoding
The Web server indicates what compression method (Gzip,deflate) It uses to compress the objects in the response.
Example: Content-encoding:gzip
Content-language
Role: The Web server tells the browser to respond to the language of the object
Example: Content-language:da
Miscellaneous header Field
Server:
Function: Indicates the software information of the HTTP server
Example: server:microsoft-iis/7.5
X-aspnet-version:
Role: If the Web site is developed with ASP, this header is used to represent the version of ASP.
Example: x-aspnet-version:4.0.30319
X-powered-by:
Role: Indicates what technology the site is developed with
Example: X-powered-by:asp.net
Transport header Field
Connection
Example: connection:keep-alive when a Web page opens, the TCP connection between the client and the server for transmitting HTTP data does not close, and if the client accesses the Web page on the server again, it will continue to use the established connection
For example: Connection:close represents the completion of a request, the TCP connection between the client and the server for transmitting HTTP data is turned off, and the TCP connection needs to be re-established when the client sends the request again.
Location Header Field
Location
Function: Used to redirect a new location, including a new URL address
For example, see 304 status instances
The HTTP protocol is a stateless and connection:keep-alive difference
Stateless means that the protocol has no memory capacity for transactions, and the server does not know what the client state is. On the other hand, there is no connection between opening a Web page on a server and the pages you have previously opened on this server.
HTTP is a stateless, connection-oriented protocol, and stateless does not mean that HTTP cannot maintain TCP connections, nor does it use the UDP protocol (no connection) on behalf of HTTP.
From http/1.1 onwards, the default is to open the keep-alive, to maintain the connection characteristics, in short, when a Web page opens, the client and server for the transmission of HTTP data between the TCP connection will not be closed, if the client again access to the Web page on this server, will continue to use this established connection.
Keep-alive does not permanently keep the connection, it has a hold time that can be set in different server software (such as Apache).
HTTP Proxy HTTP protocol detailed