HTTP cache mechanism, HTTP packet structure, HTTP request and response process, browser cache control,
Benefits of HTTP Cache
A page may have hundreds of thousands of requests. If the server needs to respond to the request every time, the load on the server may be too high and the server may be paralyzed, reducing the user experience. Using browser cache control to cache data with low real-time requirements can reduce or even display data without the need to request the server again. Benefits:
Reduce latency-Because the webpage request is directed to the Client Cache instead of the source server, the request takes a shorter time, which makes the website look faster.
Reducing Network Load-Because cached files can be used repeatedly, a lot of bandwidth is saved, which saves users a lot of traffic.
HTTP packet structure
1. Request Message: generally divided into three parts: request line, request header and request body, but note that the blank lines between the request header and request body are also HTTP request specification content. The request line consists of three parts: method, resource path, and Protocol version.
2. Response Message: The response line, response header, and response body are also divided into three parts. The empty lines of the response header and response body are also HTTP specifications. The response line consists of three parts: Protocol version, status code, and status code description.
HTTP request and Response Process
Initial Request
Request again
Browser cache Control
The HTTP protocol defines several keywords that can be used to Control the browser cache. They are: Expires, Pragma: no-Cache, cache-Control, Last-Modified, and ETag.
1. Expires: + expiration time
Expires is the header field of the Web Server Response Message. when responding to an http request, the browser is notified that the browser can directly cache data from the browser before the expiration time, without the need to request again. However, Expires is an HTTP 1.0 feature. Currently, HTTP 1.1 is used by default in all browsers, so its function is basically ignored. One disadvantage of Expires is that the returned expiration time is the time on the server side, which has a problem. If the client time differs greatly from the server time (for example, the clock is not synchronized, or cross-time zone), the error is very large, so starting with HTTP 1.1,Replaced by Cache-Control: max-age = seconds.
The expiration time must be in the HTTP format, and others will be resolved to the current time "before", the cache will expire immediately, and the HTTP date time must be Greenwich Mean Time (GMT ), instead of local time. Example:
Expires: Fri, 30 Oct 2009 14:19:41
2. Pragma: no-cache
To be compatible with HTTP1.0, you can use the Pragma: no-cache header to tell the browser not to cache content. Many people believe that setting a Pragma: no-cache HTTP protocol can control whether the cache is enabled. This is not completely correct. No Pragma Regulations are set in the HTTP protocol details. On the contrary, Pragma requests are highly controversial. Although some caches are affected by this parameter, most of them do not work at all. Use the header protocol instead! (The role is controversial. It is best not to use it)
3. Cache-control:
The direct translation of Cache-control into Chinese is Cache control, and its function is Cache control. There are several values of this http header.
1) max-age = [seconds]-execution cache is considered to be the latest longest time. Similar to the expiration time, this parameter is based on the relative time interval of the request time, instead of the absolute expiration time. [second] is a number in seconds: the number of seconds from the request time to the expiration time.
2) s-maxage = [seconds]-similar to the max-age attribute, except that it is applied to the shared (such as proxy server) Cache
3) public-the authenticated content can also be cached. Generally, the output of content that can be accessed only after HTTP Authentication cannot be cached automatically;
4) no-cache-forces each request to be sent directly to the source server without the local cache version verification. This is useful for applications that require validation (can be used together with the public), or applications that require strict use of the latest data (do not hesitate to sacrifice all the benefits of using the cache ). Indicates that the request or response message cannot be cached. This option does not mean that "no cache" can be set, which is easy to understand ~
5) no-store-forces the cache to keep no copies under any circumstances
6) must-revalidate-indicates that the cache must follow the freshness of all the copies you give. HTTP allows the cache to return expired data in certain conditions. With this attribute specified, You Can cache the data quickly, you want to strictly follow your rules.
7) proxy-revalidate-is similar to must-revalidate, except that it only works for the cache proxy server.
Example: Cache-Control: max-age = 3600, must-revalidate
4. Last-Modified/If-Modified-Since: Last-Modified/If-Modified-Since must be used with Cache-Control.
Last-Modified: indicates the Last modification time of the response resource. When the web server responds to the request, it informs the browser of the last modification time of the resource.
If-Modified-Since: when the resource expires (max-age identified by Cache-Control), it is found that the resource has a Last-Modified declaration, if-Modified-Since is added to the request to the web server again, which indicates the request time. After receiving the request, the web server compares the header If-Modified-Since with the last modification time of the requested resource. If the last modification time is relatively new, it indicates that the resource has been modified, the system will respond to the content of the entire resource (written in the Response Message package), HTTP 200. If the last modification time is the same, if no new changes are made to the resource, the system responds to HTTP 304 (no package body is required, saving browsing) and notifies the browser to continue using the saved cache.
5. Etag/If-None-Match: Etag/If-None-Match must also be used with Cache-Control.
Etag: indicates the unique identifier of the current resource on the server when the web server responds to the request (the generation rule is determined by the server ). In Apache, the value of ETag is obtained after the INode, Size, and MTime of the file are hashed by default.
If-None-Match: when the resource expires (using the max-age marked by Cache-Control), it is found that the resource has an Etage declaration, if-None-Match (Etag value) is added to the request to the web server again ). After receiving the request, the web server compares the existing header If-None-Match with the corresponding verification string of the requested resource and determines to return 200 or 304.
Ps:Difference between Etag and Last-Modified:
1. The Last modification of the Last-Modified annotation can only be accurate to seconds. If some files are Modified multiple times within one second, the Last modification time of the Last-Modified annotation cannot be accurately specified.
If some files are generated on a regular basis, when the content does not change, but the Last-Modified changes, the file cannot use the cache.
There may be situations where the server does not accurately obtain the file modification time or is inconsistent with the proxy server time.
2. The Etag is the unique identifier of the corresponding resource automatically generated by the server or generated by the developer on the server, which can control the cache more accurately. When Last-Modified and ETag are used together, the server verifies ETag first.
In the Yslow rule of yahoo, you are prompted to set Etag with caution: The last-modified files must be consistent among multiple machines in the distributed system, so as to avoid comparison failure caused by load balancing to different machines, yahoo recommends that the distributed system shut down Etag whenever possible (the etag generated by each machine is different, because it is difficult to maintain consistency except last-modified and inode ).
User behavior and Cache
Browser cache behavior is also related to user behavior.
HTML configuration no-cache
HTML configuration does not belong to the HTTP protocol. Web developers can add tags to HTML page nodes (only supported by some browsers; the cache proxy server does not parse html content, so it is not supported)
No-cache Configuration