First, the concept
1.HTTP protocol: Hypertext Transfer Protocol (Hypertext Transfer Protocol). Is a rule that specifies the communication between the browser and the Web server, which allows Hypertext Markup Language (HTML) documents to be routed from the Web server to the client's browser.
It can make the browser more efficient and reduce the network transmission. It not only ensures that the computer transmits hypertext documents correctly and quickly, but also determines which part of the document is being transmitted, and which content is displayed first (such as text before graphics), and so on.
HTTP is an application-layer protocol that consists of requests and responses and is a standard client server model. HTTP is a stateless protocol.
All transmissions in the Internet are made through TCP/IP. The HTTP protocol is no exception to the protocol used as the application layer in the TCP/IP model. The HTTP protocol is usually hosted on top of the TCP protocol, sometimes hosted on the TLS or SSL protocol layer, which is what we often call HTTPS.
HTTP is an object-oriented protocol belonging to the application layer, which is suitable for distributed hypermedia information System because of its simple and fast way. It was proposed in 1990, after several years of use and development, has been continuously improved and expanded. Currently used in the WWW is the sixth edition of Http/1.0, http/1.1 standardization work is in progress, and Http-ng (Next Generation of HTTP) has been proposed.
2. Stateless protocol:
The state of the Protocol refers to the ability of the next transmission to "remember" the transmission of this information.
HTTP is not to maintain the information transmitted by this connection for the next connection, in order to ensure server memory.
For example, when a customer gets a Web page, closes the browser, launches the browser again, and then logs on to the site, but the server does not know that the client closed the browser.
Due to the Web server's concurrent access to many browsers, in order to improve the processing power of the Web server for concurrent access, when designing the HTTP protocol, the Web server is required to send HTTP response messages and documents without saving any state information from the Web browser process that made the request. It is possible for a browser to access the same object two times within a few seconds, and the server process will not accept a second service request because it has already sent a reply message to it. Because the Web server does not save any information about the Web browser process that sent the request, the HTTP protocol is a stateless protocol (stateless Protocol).
The 3.HTTP protocol is a stateless and connection:keep-alive difference:
Stateless means that the protocol has no memory capacity for transactions, and the server does not know what the client state is. On the other hand, there is no connection between opening a Web page on a server and the pages you have previously opened on this server.
HTTP is a stateless, connection-oriented protocol, and stateless does not mean that HTTP cannot maintain TCP connections, nor does it use the UDP protocol (no connection) on behalf of HTTP.
From http/1.1 onwards, the default is to open the keep-alive, to maintain the connection characteristics, in short, when a Web page opens, the client and server for the transmission of HTTP data between the TCP connection will not be closed, if the client again access to the Web page on this server, will continue to use this established connection.
Keep-alive does not permanently keep the connection, it has a hold time that can be set in different server software (such as Apache).
4.HTTP Chinese Translation problem
HTTP Hypertext Transfer Protocol, which sounds like a transport layer protocol, but in fact everyone knows that HTTP and FTP are the same as the application layer of the Protocol. Since it is the application layer of the Protocol, how to take such a misleading name? It's easy to misunderstand and wonder when you're not familiar with the TCP/IP protocol. There is a passage on the wiki:
HTTP is translated as "Hypertext Transfer Protocol" in mainland China because "transfer" has the meaning of "transmission" in Chinese. But according to Dr. Roy Fielding, one of the HTTP Customizer's papers [1] (6.5.3), the author specifically emphasizes that "transfer" represents "transfer" (representational state transfer) rather than "transfer "(transport). Therefore, the Chinese translation of the "Hypertext Transfer Protocol" reflects this misunderstanding. More in line with Literal's translation should be "Hypertext transfer agreement."
Second, the characteristics
The main features of the HTTP protocol can be summarized as follows:
1. Support client/server mode. Support Basic authentication and security certification.
2. Simple and fast: When a customer requests a service from the server, it simply transmits the request method and path. The request method commonly has, POST. Each method specifies a different type of contact between the customer and the server. Because the HTTP protocol is simple, the HTTP server's program size is small, so the communication speed is fast.
3. Flexible: HTTP allows the transfer of any type of data object. The type being transmitted is marked by Content-type.
4.HTTP 0.9 and 1.0 use non-persistent connections: Restricts each connection to only one request, the server finishes processing the customer's request, and then receives the customer's answer, which disconnects the connection.
HTTP 1.1 uses persistent connections: You do not have to create a new connection for each Web object, and a connection can transfer multiple objects in such a way that you can save transfer time.
5. Stateless: The HTTP protocol is a stateless protocol. Stateless means that the protocol has no memory capacity for transactional processing. A lack of state means that if the previous information is required for subsequent processing, it must be re-routed, which may cause the amount of data to be transferred per connection to increase. On the other hand, it responds faster when the server does not need the previous information.
6. Always the client initiates the request, and the server echoes the response. This limits the use of the HTTP protocol, which cannot be implemented when the client does not initiate a request, the server pushes the message to the client.
The port number for the 7.HTTP default port number is 80,https 443.
Third, the work flow
An HTTP operation is called a transaction, and its working process can be divided into four steps:
1. First the client and the server need to establish a connection. As soon as you click on a hyperlink, the HTTP work begins.
2. After the connection is established, the client sends a request to the server in the form of a Uniform Resource Identifier (URL), protocol version number, followed by MIME information including the request modifier, client information, and possible content.
3. When the server receives the request, it gives the corresponding response information in the form of a status line, including the protocol version number of the information, a successful or incorrect code, followed by MIME information including server information, entity information, and possible content.
4. The information returned by the client receiving server is displayed by the browser on the user's display, and then the client disconnects from the server.
If an error occurs in one of the steps above, the information that generates the error is returned to the client, with the display output. For the user, these processes are done by HTTP itself, the user just click with the mouse, waiting for information to display it.
HTTP is a transport-layer-based TCP protocol, and TCP is an end-to-end connection-oriented protocol. The so-called end-to-end can be understood as process-to-process communication. So HTTP begins with a TCP connection before starting the transfer, and the TCP connection process requires a so-called "three handshake". After the TCP three handshake, a TCP connection is established, at which point the HTTP can be transmitted. An important concept is connection-oriented, where HTTP is not disconnected from the TCP connection between completion of the transfer. In HTTP1.1 (set by connection header) This is the default behavior.
Four, head field
Each header field consists of a domain name, a colon (:), and a domain value of three parts. Domain names are case-insensitive, you can add any number of whitespace before the domain value, and the header field can be expanded to multiple lines, at the beginning of each line, with at least one space or tab.
HTTP messages consist of client-to-server requests and server-to-client responses. Both the request message and the response message are from the start line (for the request message, the start line is the request line, for the response message, the start line is the status line), the message header (optional), the empty line (only the CRLF line), and the message body (optional) is composed.
1. Request message
The HTTP request consists of three parts: the request line, the message header, and the request body. The request message format is issued as follows:
A request line, such as Get/images/logo.gif http/1.1, represents a request logo.gif this file from the/images directory.
The request header, each header field is composed of the name + ":" + space + value, the message header field name is case-independent. such as Accept-language:en
Blank Line
The optional message body request line and header must end with <CR><LF> (that is, enter and then wrap). There must be only <CR><LF> in the empty line and no other spaces. In the http/1.1 protocol, all request headers, except post, are optional.
(1) Request line
Start with a request method, separated by a space, followed by the requested URI and version of the Protocol.
The format is as follows: Method Request-uri http-version CRLF
method means the request;
Request-uri is a uniform resource identifier;
Http-version represents the HTTP protocol version of the request;
CRLF represents a carriage return and a newline (except for the end of CRLF, a separate CR or LF character is not allowed).
A. Request method:
The http/1.1 protocol defines eight methods (sometimes called "actions") to indicate different ways of Request-uri a specified resource: GET makes a request to a specific resource. Note: The Get method should not be used in operations that produce "side effects", such as in Web apps. One of the reasons is that get can be accessed by web spiders and other casual. POST submits data to the specified resource for processing requests (such as submitting a form or uploading a file). The data is included in the request body. A POST request may result in the creation of new resources and/or modification of existing resources. The HEAD asks the server for a response that is consistent with the GET request, except that the response body will not be returned. This method allows you to obtain meta information contained in the response message header without having to transmit the entire response content. This method is commonly used to test the validity of hyperlinks, whether they can be accessed, and whether they have been updated recently. PUT uploads its latest content to the specified resource location. the delete request server deletes the resource identified by the Request-uri. TRACE echoes the requests received by the server, primarily for testing or diagnostics. The Connect http/1.1 protocol is reserved for proxy servers that can change connections to pipelines. Options returns the HTTP request method that the server supports for a specific resource. You can also test the functionality of your server with a request to send a ' * ' to the Web server. Note: The HTTP server should at least implement the get and head methods, and the other methods are optional. In addition, in addition to the methods described above, a specific HTTP server can also extend a custom method.
The difference between b.get and post:
Get submitted data is placed after the URL, to split the URL and transfer data, the parameters are connected with &, such as editposts.aspx?name=test1&id=123456. The Post method is to put the submitted data in the body of the HTTP packet.
There is a limit to the data size for get submissions, which can be up to 1024 bytes (because the browser has a limit on the length of the URL), and there is no limit to the data submitted by the Post method.
The Get method needs to use Request.QueryString to get the value of the variable, and the Post method takes the value of the variable by Request.Form.
The Get method submits the data, which brings security problems, such as a login page, when the data is submitted via get, the user name and password will appear on the URL, and if the page can be cached or someone else can access the machine, the user's account and password can be obtained from the history record.
(2) Request header
The request header allows the client to pass additional information about the request to the server side, as well as the client itself.
Common Request headers:
Accept: The MIME type that can be accepted by the browser side. For example: Accept:image/gif, indicating that the client wants to accept a resource in the GIF image format; accept:text/html, indicating that the client wants to accept HTML text. The Accept-charset:accept-charset request header field is used to specify the character set accepted by the client. For example: accept-charset:iso-8859-1,gb2312. If the field is not set in the request message, the default is to accept any character set. Accept-encoding: The browser affirms its own acceptable encoding method, usually specifying the compression method, whether compression is supported, what compression method is supported (Gzip,deflate) Accept-language: The browser affirms the language it receives. Language and Character set differences: Chinese is a language, Chinese has a variety of character sets, such as BIG5,GB2312,GBK, such as: accept-language:en-us. If the header field is not set in the request message, the server assumes that the client is acceptable for each language. Accept-charset: The acceptable character set of the browser. If the field is not set in the request message, the default means that any character set can be accepted. User-agent: Tells the HTTP server the name and version of the operating system and browser that the client is using. For example: user-agent:mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; trident/4.0; CIBA;. NET CLR 2.0.50727;. NET CLR 3.0.4506.2152;. NET CLR 3.5.30729;. net4.0c; infopath.2;. NET4.0E) Authorization: Authorization information, which typically occurs in the answer to the Www-authenticate header sent to the server. Used primarily to prove that a client has permission to view a resource. When a browser accesses a page, if a response code of 401 (unauthorized) is received from the server, a request containing the authorization request header domain can be sent, requiring the server to validate it. Host: (The header domain is required when sending a request) is primarily used to specify the Internet host and port number of the requested resource, which is usually extracted from the HTTP URL and is required when sending the request. The http/1.1 request must contain the host header domain or the system will return with a 400 status code. For example: We entered in the browser: http://luyucheng.cnblogs.comIn the request message sent by the/index.html browser, the Host request header field is included, as follows: Host:luyucheng.cnblogs.com The default port number 80 is used here, and if a port number is specified, it becomes: Host:luyucheng.cnblogs.com : Specify the port number cookie: One of the most important request headers to send the value of the cookie to the HTTP server. Content-length: Represents the length of the request message body. For example: content-length:38. Content-type: for example: content-type:application/x-www-form-urlencoded. From: The email address of the requesting sender, used by some special Web client, is not used by the browser. Range: You can request one or more child ranges for an entity. For example, represents the first 500 bytes: bytes=0-499 represents the second 500 bytes: bytes=500-999 represents the last 500 bytes: bytes=-500 represents the range after 500 bytes: bytes=500-First and last byte: bytes= 0-0,-1 specifies several ranges: bytes=500-600,601-999 but the server can ignore this request header, and if the unconditional get contains a range request header, the response is returned as a status code of 206 (partialcontent) instead of a (OK). If-modified-since: The last modification time of the browser-side cache page is sent to the server, and the server compares this time with the last modification time of the actual file on the server. If the time is the same, then return 304, the client uses the local cache file directly. If the time is inconsistent, 200 and the new file contents are returned. After the client receives it, it discards the old files, caches the new files, and displays them in the browser. For example: If-modified-since:thu, 09:07:57 Gmtif-none-match:if-none-match and ETag work together to add etag information in HTTP response. When the user requests the resource again, the If-none-match information (the value of the ETag) is added to the HTTP request. If the server verifies that the etag of the resource has not changed (the resource is not updated), it returns a 304 status that tells the client to use the local cache file. Otherwise, the 200 state and the new resource and ETag are returned. Using such a mechanism will improve the performance of your website. For example: If-none-match: "03f2b33c0bfcc1:0 ". Referer: Contains a URL from which the user accesses the currently requested page from the page represented by the URL. A server that provides context information about the request, tells the server which link I have received from, such as linking to a friend from my home page, and his server is able to count the number of times per day from the HTTP referer to access his site by clicking on the link on my page. For example: Referer:http://luyucheng.cnblogs.com/pragma: Specifying a value of "No-cache" indicates that the server must return a refreshed document, even if it is a proxy server and has a local copy of the page; in http/ In version 1.1, it works exactly the same as Cache-control:no-cache. Pargma has only one usage, for example: Pragma:no-cache Note: In the http/1.0 version, only Pragema:no-cache is implemented and Cache-controlconnection is not implemented: for example: Connection: keep-alive when a Web page is opened, the TCP connection between the client and the server for transmitting HTTP data does not close, and if the client accesses the Web page on the server again, it will continue to use this established connection. HTTP 1.1 makes persistent connections by default. With the benefits of persistent connections, when a page contains multiple elements (such as applets, pictures), it significantly reduces the time it takes to download. To do this, the servlet needs to send a content-length header in the answer, and the simplest implementation is to write the content to Bytearrayoutputstream first and then calculate its size before formally writing the content. Connection:close represents the completion of a request, the TCP connection between the client and the server for transmitting HTTP data is turned off, and the TCP connection needs to be re-established when the client sends the request again. Host: (The header domain is required when sending a request) is primarily used to specify the Internet host and port number of the requested resource, which is usually extracted from the HTTP URL. The http/1.1 request must contain the host header domain or the system will return with a 400 status code. For example: We enter in the browser: http://luyucheng.cnblogs.com/index.html, the browser sends the request message, it will contain the Host request header field: host:http:// Luyucheng.cnblogs.com, the default port number 80 is used here, and if a port number is specified, it becomes: Host: Specify the port number. Cookie: One of the most important request headers that sends the value of a cookie to an HTTP server. Authorization: Authorization information, which typically occurs in an answer to the Www-authenticate header sent to the server. Used primarily to prove that a client has permission to view a resource. When a browser accesses a page, if a response code of 401 (unauthorized) is received from the server, a request containing the authorization request header domain can be sent, requiring the server to validate it. UA-PIXELS,UA-COLOR,UA-OS,UA-CPU: A nonstandard request header sent by some versions of Internet Explorer to indicate screen size, color depth, operating system, and CPU type. From: The email address of the requesting sender, used by some special Web client, is not used by the browser. Range: You can request one or more child ranges for an entity. For example, represents the first 500 bytes: bytes=0-499 represents the second 500 bytes: bytes=500-999 represents the last 500 bytes: bytes=-500 represents the range after 500 bytes: bytes=500-First and last byte: bytes= 0-0,-1 specifies several ranges: bytes=500-600,601-999 but the server can ignore this request header, and if the unconditional get contains a range request header, the response is returned as a status code of 206 (partialcontent) instead of a (OK). Cache-control: Specifies the caching mechanism that requests and responses follow. The cache instruction is unidirectional (the cache instruction that appears in the response may not appear in the request) and is independent (setting Cache-control in the request message or response message does not modify the caching process in the other message processing process). The cache directives for the request include No-cache, No-store, Max-age, Max-stale, Min-fresh, only-if-cached, and the instructions in the response message include public, private, No-cache, No-store, No-transform, Must-revalidate, Proxy-revalidate, Max-age, S-maxage. Cache-control:public can be cached by any cache cache-control:private content is only cached in the private cache Cache-control:no-cache all content is not cached Cache-control: No-store is used to prevent the inadvertent release of important information. Sending in a request message will make the request and the response disappear.The cache is not used. Cache-control:max-age indicates that the client can receive a response that is not longer than the specified time (in seconds). Cache-control:min-fresh indicates that the client can receive a response that is less than the current time plus a specified time. Cache-control:max-stale indicates that the client can receive a response message that exceeds the timeout period. If you specify a value for the Max-stale message, the client can receive a response message that exceeds the specified value for the timeout period.
2. Response messages
The HTTP response consists of three parts: the status line, the response header, and the response body.
The client sends a request to the server that responds with a status line that includes the version of the message protocol, the success or error encoding, the server information, the entity meta information, and the necessary entity content. Depending on the category of the response category, the server response can contain entity content, but not all responses have entity content.
(1) Status line
The first line of the response header is also known as the status line, in the following format:
Http-version Space Status-code Space Reason-phrase CRLF
Http-version represents an HTTP version, for example, http/1.1.
The Status-code is the result status response code, expressed in three digits.
Reason-phrase is a simple text description that explains the specific reasons for Status-code. The Status-code is used for machine automatic identification and reason-phrase for human understanding. The first digit of the Status-code represents the response category and may take 5 different values. The latter two numbers do not have a classification effect. The first digit of the Status-code represents the category of the response, and subsequent two bits describe the specific situation that occurs under the response.
A. Status Response code:
Whenever you browse a webpage, your computer obtains the requested data through a server that uses the HTTP protocol. The Web server that dominates the Web page returns an HTTP header file that contains a status code before the page you request appears in the browser. This status code provides information about the conditions of the requested Web page. If everything works, a standard Web page will receive a status code such as 200. Of course, our goal is not to study the 200 response code, but to explore the server header file response code that represents the error message, such as 404 yards "No specified page found".
1XX (Information Class): Indicates that a request is received and continues processing 100 The customer must continue to make a request 101 customer requests the server to convert the HTTP protocol version 2xx (response success) on request: Indicates that the action was successfully received, understood, and accepted 200 indicates that the request was successfully completed, The requested resource is sent back to the client 201 prompt to know that the new file is URL202 accepted and processed, but the processing is not completed 203 The return information is indeterminate or incomplete 204 requests are received, but the return information is NULL 205 the server completed the request and the user agent must reset the currently browsed file 206 The server has completed a partial user's GET request 3xx (redirect Class): In order to complete the specified action, the resource that must accept further processing 300 requests can be obtained in multiple locations 301 pages are permanently transferred to another URL302 requested page is transferred to a new address, However, the client access continues through the original URL address, redirected, the new URL will be returned in the location in response, the browser will use the new URL to issue a new request. 303 customers are advised to visit another URL or Access method 304 since the last request, the requested Web page has not been modified, the server returns this response, the Web page content is not returned, the last document has been cached, you can continue to use the 305 requested resource must be obtained from the server specified address 306 Code used in the previous version of HTTP, the current version is no longer using the 307 declared request for temporary deletion of 4xx (client error Class): The request contains an error syntax or does not correctly execute 400 client request has a syntax error, cannot be understood by the server 401 request Unauthorized, This status code must be used with the Www-authenticate header domain for HTTP 401.1 Unauthorized: Logon failure HTTP 401.2 not authorized: Server configuration issue causes logon failure HTTP 401.3 ACL Forbidden Access resource HTTP 401.4 Unauthorized: Authorization is denied by the filter HTTP 401.5 is not authorized: ISAPI or CGI authorization failed 402 reserved valid Chargeto header response 403 Forbidden, Server received request, but denied service HTTP 403.1 Forbidden: Disable executable access HTTP 403.2 Forbidden: Disable Read access HTTP 403.3 Forbidden: Prohibit write access HTTP 403.4 Forbidden: Require sslhttp 403.5 Forbidden: Require SSL 128HTTP 403.6 Forbidden: IP address denied HTTP 403.7 Forbidden: Request Guest User certificate HTTP 403.8 Forbidden: Prohibit site access HTTP 403.9 Forbidden: Too many users are connectedHTTP 403.10 Forbidden: Invalid configuration http 403.11 Forbidden: Password change http 403.12 Forbidden: Mapper denied access HTTP 403.13 Forbidden: Client certificate revoked HTTP 403.15 Forbidden: Client Access License too many HTTP 403.16 Forbidden: Client certificate is untrusted or invalid HTTP 403.17 Forbidden: Client certificate has expired or has not been in effect 4,041 404 errors indicate that the server can be connected, but the server cannot get the requested webpage and the requested resource does not exist. For example: The wrong URL405 user is entered in the Request-line field defined method does not allow 406 according to the user to send the accept drag, the request resource is not accessible 407 like 401, the user must first be authorized on the proxy server 408 The client does not complete the request 409 to the current resource state within the user-specified time of starvation, the request cannot be completed on the 410 server no longer has this resource and no further reference address 411 server rejects user-defined Content-length property request 412 One or more request header fields in the current request error 413 The requested resource is larger than the server allowed size 414 requested resource URL is longer than the server allowed length 415 request resource does not support request Item format 416 request contains a range request header field, There is no range indication in the current request resource scope, and the request does not contain If-range request header field 417 The server does not meet the expected value specified in the Request Expect header field, and if it is a proxy server, the next level of server may not meet the request length. 5xx (Service-side error Class): The server is not performing a correct request correctly 500 the server encountered an error, unable to complete the request HTTP 500.11 server shutdown HTTP 500.12 application restart HTTP 500.13 Server too busy HTTP 500.14 Invalid application HTTP 500.15 does not allow request Global.asahttp 500.100 Internal Server Error-ASP error 501 is not implemented 502 Gateway error 503 Server is currently unavailable due to overloading or downtime maintenance and may return to normal after a period of time
(2) Response header
The server needs to pass on a lot of additional information that cannot be placed entirely in the status line. Therefore, the response header needs to be defined separately to describe the additional information. The response header mainly describes the server's information and Request-uri information.
Common response Headers:
Location: Redirect the recipient to a new position. Location response header fields are commonly used when changing domain names. Server: Indicates the software information that the HTTP server uses to process the request. For example: server:microsoft-iis/7.5, server:apache-coyote/1.1. This field can contain multiple product identifiers and annotations, and product identities are generally sorted by importance. Refresh: Indicates how much time the browser should refresh the document, in seconds. The www-authenticate:www-authenticate response header domain must be contained in a 401 (unauthorized) response message when the client receives a 401 response message and sends the authorization header domain to the request server to validate it. The service-side response header contains the header field. Www-authenticate:basic realm= "Basic Auth test!" You can see that the server is using a Basic authentication mechanism for the requested resource. Connection: For example: connection:keep-alive when a Web page opens, the TCP connection between the client and the server for transmitting HTTP data is not turned off, and if the client accesses the Web page on the server again, it will continue to use this established connection. Connection:close represents the completion of a request, the TCP connection between the client and the server for transmitting HTTP data is turned off, and the TCP connection needs to be re-established when the client sends the request again. Referer: Contains a URL from which the user accesses the currently requested page from the page represented by the URL. A server that provides context information about the request, tells the server which link I have come from, such as linking to a friend from my home page, and his server is able to count the number of users who clicked the link on my page every day from the HTTP referer to visit his website. For example: The Referer:http://luyucheng.cnblogs.com/content-encoding:web server indicates what compression method (Gzip,deflate) It uses to compress the objects in the response. The content type specified by the Content-type header can be obtained only after decoding. Using gzip to compress documents can significantly reduce the download time of HTML documents. For example: The Content-encoding:gzipcontent-language:web server tells the browser the natural language of the object it responds to. The domain is not set and the entity content is considered to be available to all languages for reading. Example: Content-languaGe:da. Content-range: Used to specify the insertion position of a part of an entire entity, and he also indicates the length of the entire entity. When the server returns a partial response to the customer, it must describe the extent of the response coverage and the entire length of the entity. General format: Content-range:bytes-unitspfirst-byte-pos-last-byte-pos/entity-length. For example, transfer the first 500 bytes in the form of a field: content-range:bytes0-499/1234 If an HTTP message contains this section (for example, a response to a range request or an overlapping request to a range of ranges), Content-range represents the range of the transfer. Content-type: The media type that is sent to the recipient's entity body. The format of the media type is: Large class/Small class, such as text/html. For example: content-type:text/html;charset=utf-8content-type:image/jpeglast-modified: Date and time when the resource was last modified. ETAG: Used in conjunction with If-none-match. Expires: The date and time when the response expires. In order for a proxy server or browser to update the cache after a period of time (once again accessing pages that have been visited, loading directly from the cache, shortening response times, and reducing server load), we can use the Expires entity header domain to specify when the page expires. For example: expires:thu,15 SEP 2006 16:23:12 GMTHTTP1.1 client and cache must treat other illegal date formats (including 0) as expired. For example: In order to let the browser do not cache the page, we can also take advantage of the Expires entity header domain, set as 0,jsp in the program as follows: Response.setdateheader ("Expires", "0"); Allow: Which request methods are supported by the server (such as GET, post, etc.). Date: Indicates the time when the message was sent, and the description format of the time is defined by rfc822. For example, Date:mon,31dec200104:25:57gmt. The time described by date represents the world standard, which translates into local time and needs to know the time zone in which the user is located. You can use Setdateheader to set this header to avoid the hassle of converting the time format expires: Indicates when the document should be considered expired, so it is no longer cached, retrieved from the server, and the cache is updated. Use the local cache before it expires. The HTTP1.1 client and cache consider the illegal date format (including 0) as expired. For example, in order for the browser not to cache pages, we can also set the Expires Entity header field to 0. For example: Expires:tue, 2022 11:35:14 gmtp3p: Used to set cookies across domains, which resolves an iframe cross-domain access cookie problem For example: P3p:cp=cura ADMa DEVa Psao psdo Our BUS UNI PUR INT DEM STA PRE COM NAV OTC NOI DSP Corset-cookie: Very important header for sending cookies to the client browser, each write cookie generates a Set-coo Kie For example: set-cookie:sc=4c31523a; path=/; Domain=.acookie.taobao.comiana (The Internet Assigned Numbers Authority, internet Digital Distribution Agency) defines 8 categories of media types, namely: application (for example: Application/vnd.ms-excel) Audio (for example: audio/mpeg) image (for example: image/png) message (for example: message/http) model (for example: MODEL/VRML) Multipart (example: multipart/form-data) text (for example: text/html) Video (for example: Video/quicktime)
Five, the implementation of the principle of caching
The Web cache (cache) is located between the Web server and the client, which saves copies of the output as requested, such as HTML pages, pictures, and files, when the next request arrives: If the same URL is used, the cache responds directly to the access request using the replica instead of sending the request again to the source server.
The HTTP protocol defines the associated message headers to make the Web cache work as well as possible.
1. Benefits of Caching
Reduce latency: Because requests are from the cache server (closer to the client) instead of the source server, this process takes less time and makes the Web server appear to be faster.
Reduced network bandwidth consumption: Reduces client bandwidth consumption when replicas are reused, and customers can save bandwidth costs, increase the need to control bandwidth, and make it easier to manage.
2. Common processes in which client-side caching takes effect
When the server receives the request, it echoes the resource's last-modified and ETag headers in 200OK, the client saves the resource in the cache, and records both properties. When a client needs to send the same request, if-modified-since and If-none-match Two headers are carried in the request. The value of two headers is the value of the last-modified and ETag headers in the response, respectively. The server uses these two headers to determine that the local resource has not changed and that the client does not need to re-download and return a 304 response.
3.WEB caching mechanism
The purpose of the cache in http/1.1 is to reduce the sending request in many cases, while in many cases it is not necessary to send a full response. The former reduces the number of network loops; HTTP uses an "out-of-date (expiration)" mechanism for this purpose. The latter reduces bandwidth for network applications, and HTTP uses the "Authentication (validation)" mechanism for this purpose.
HTTP defines 3 kinds of caching mechanisms:
(1) Freshness: Allows a response message to be unchecked on the source server and can be controlled by the server and the client. For example, the expires response header gives a time when a document is unavailable. The max-age identifier in the Cache-control indicates the maximum time for the cache;
(2) Validation: Used to check whether a cached response is still available. For example, if a response has a last-modified response header, the cache can use If-modified-since to determine if a change has been made in order to determine whether the request is being sent;
(3) Invalidation: There is often a side effect when another request passes through the cache. For example, if a URL is associated with a cached response, but subsequent requests for post, put, and delete are followed, the cache expires.
Vi.. Application
1. The implementation principle of the breakpoint continuation transmission
The Get method of the HTTP protocol, which supports requesting only a certain part of a resource;
206 partial content response;
Range of resources requested by range;
Content-range the resource range of the response;
When a connection is disconnected, the client requests only the portion of the resource that is not downloaded, instead of requesting the entire resource again to implement the breakpoint continuation.
Chunked Request Resource instance:
eg1:range:bytes=306302-: Request This resource from 306,302 bytes to the end of the section;
Eg2:content-range:bytes 306302-604047/604048: The byte in the response indicating the 第306302-604047 of the resource, which is a total of 604,048 bytes;
The client implements concurrent chunked downloads of a resource by concurrently requesting different fragments of the same resource. So as to achieve the purpose of fast download. At present, the popular flashget and thunder are basically the principle.
2. The principle of multi-threaded download
The download tool opens multiple threads that make HTTP requests;
Only one part of the resource file is requested for each HTTP request: Content-range:bytes 20000-40000/47000;
Merges the files downloaded by each thread.
3.HTTP Proxy
HTTP Proxy Server:
Proxy Server English full name is proxy server, its function is proxy network users to obtain network information. The image says: It is the intermediary of the network information.
The proxy server is a server between the browser and the Web server, and with it, the browser does not go directly to the Web server to retrieve the Web page but makes a request to the proxy server, which is sent to the proxy server first. The proxy server retrieves the information needed by the browser and transmits it to your browser.
Moreover, most proxy servers have the function of buffering, like a large cache, it has a lot of storage space, it constantly store the new data to its native storage, if the browser requested data on its native memory already exists and is up to date, Then it will not re-fetch data from the Web server, and directly transfer the data on the memory to the user's browser, which can significantly improve the browsing speed and efficiency. More importantly: Proxy server (proxy) is an important security feature provided by Internet link-level gateways, and its work is mainly in the dialogue layer of Open Systems Interconnection (OSI) model.
The main features of the HTTP proxy server:
(1) Breach of its own IP access restrictions, access to foreign sites. Such as: Education Network, 169 networks and other network users can access foreign websites through the agent;
(2) access to some units or groups of internal resources, such as a university FTP (provided that the proxy address within the permitted access to the resource), the use of the Education Network address free proxy server, can be used to open the education network of various types of FTP download upload, as well as all kinds of information query sharing services;
(3) Break through the IP blockade of China Telecom: China Telecom users have many websites are restricted access, this restriction is artificial, different serve on the address of the blockade is different. So can not be accessed when a foreign proxy server to try;
(4) Improve the speed of access: Usually the proxy server set a large hard disk buffer, when the outside information through, but also save it to the buffer, when other users access the same information, the buffer is directly removed from the information, passed to the user to improve access speed;
(5) Hide the real IP: Internet users can also hide their IP in this way, from attack.
For the client browser, the HTTP proxy server is the equivalent of the server.
For the Web server, the HTTP proxy server assumes the role of the client.
4. Virtual Hosting
Virtual Host: The network server is divided into a certain amount of disk space for users to place the site, application components, etc., to provide the necessary site functions and data storage, transmission functions.
The so-called virtual host, also known as "web Space" is a server running on the Internet into a number of "virtual" server, each virtual host has a separate domain name and a complete Internet server (support WWW, FTP, e-mail, etc.) function. The different virtual hosts on a single server are independent and managed by the users themselves. However, a server host can only support a certain number of virtual hosts, and when this amount is exceeded, the user will feel a sharp decline in performance.
The implementation principle of virtual host:
A virtual host is a technology that uses the same Web server to provide services to different domain name sites. Apache, Tomcat and so on can be configured to achieve this function.
Related HTTP message header: Host.
Example: Host:luyucheng.cnblogs.com
When the client sends an HTTP request, it carries the host header, and the host header records the domain name entered by the client. This allows the server to confirm to which domain the client is accessing, based on the host header.
See MORE:
Develop a small program example tutorial
PHP Security Web Attack
Second-kill system design optimization
MySQL optimization
Common IO models under Linux
HTTP protocol Collation