Detailed analysis of browser HTTP protocol cache principles

Source: Internet
Author: User
Tags current time html page http 200 http request time zones browser cache nginx reverse proxy

Analysis of the principle of browser HTTP caching

Browser caching principle

Text version Description

① browser to access server resources for the first time/index.html

The file is not cached in the browser and the request is sent directly to the server.

The server returns to OK, the entity returns the content of the index.html file, and sets a cache expiration time, a file modification time, a index.html based on the content of the entity tag entity tag, referred to as ETag.

The browser caches the request for the/index.html path to the local.

② Browser Second Access server resource/index.html

Because a cached file is already in the path locally, the request is not sent directly to the server this time.

First, make a cache expiration judgment. The browser determines whether the cached file expires by setting the cache expiration time according to ①.

Scenario One: If there is no expiration, then do not send the request to the server, directly using the results of the cache, we can see in the browser console (from cache), the situation is full use of caching, the browser and the server does not have any interaction.

Scenario Two: If it is expired, send a request to the server with the file modification time set in ① in the request, and the ETag

Then, make the resource update judgment. The server according to the browser passed the file modification time, judging from the browser last request, the file is not modified; According to ETag, the contents of the file have not changed since the last request.

Case one: If the conclusion of both judgments is that the file has not been modified, then the server will not give the browser to send index.html content, directly to tell it, the file has not been modified, you use your side of the cache--304 not Modified, The browser then retrieves the contents of the index.html from the local cache. In this case, the protocol cache is called, and there is one request interaction between the browser and the server.

Case TWO: If the modification time and the contents of the file to judge any one did not pass, then the server will accept this request, after the operation with ①

My writing ability may be limited, in order to try to describe this process clearly, the following

A picture to cover the



Cache Related header fields

Request Cache Related Header field


①cache-control used to make cache expiration judgments

Common instructions:

No-cache does not use caching directly, always initiates a request to the server

Max-age cache expiration Time, is a time value, such as 3,600 seconds, set to 0 when the effect is equivalent to No-cache

S-maxage to the cache proxy, which is invalid for the server directly returning the resource, ignoring the value of max-age when S-maxage is in effect

Only-if-cached If there is caching, only use caching, if the cache file problems, the request will also be a problem

②pragma used to make cache expiration judgments

It can take value No-cache

This is a http1.0-left field that will be covered by Cache-control when it and Cache-control exist at the same time.

③if-match/if-none-match used to make resource update judgments

This directive will pass the cache etag to the server, the server uses it to compare with the server-side resource ETag, if the inconsistency proves the resource is modified, need to respond to request for OK

④if-modified-since used to make resource update judgments

This directive will update the file in the last cache to the server, the server to determine whether the file is modified after this point in time, if modified, you need to respond to request for OK

Response Cache Related Header field


①cache-control used to set cache expiration time

Common instructions:

No-cache let the client not directly use the cache, always to the server request, do not set the default is this, the above screenshot of the request is omitted, so the client will not use the cache directly.

Max-age cache expiration Time, is a time value, such as 3,600 seconds, set to 0 when the effect is equivalent to No-cache

S-maxage to the cache proxy, which is invalid for the server directly returning the resource, ignoring the value of max-age when S-maxage is in effect

Private/public default is private, cached in only one browser, cache can be shared by multiple users when set to public

②etag used to set up entity labels generated from resource content

This value has strong tag and weak tag, the difference is calculated in different ways, only strong tag will be updated when the resource changes immediately, the request header in the If-match/if-none-match field will return this value to the server

③age

This field is used to tell the client how long this response was created, in seconds, that the cache server must create this field when it returns resources

Entity Header cache related fields

The head of response may also include the header of the entity, and the entity header is immediately following the response header.

①last-modified-time--used to set the last time the resource was modified

②exprire--Setting file Expiration time

The function of this field is the same as Cache-control, the difference is that it directly specifies a cache expiration point, which is susceptible to client time.

This is also a legacy field that will be overwritten when the Cache-control exists

Some considerations for caching configuration

① only get requests are cached, post requests do not

②etag when a resource is distributed across multiple machines, different server-generated Etag may not be the same for the same resource, causing the 304 protocol cache to fail and the client to directly fetch resources from the server. You can modify the generation of server-side etag and generate the same etag based on the content of the resource.

③ system Online, update resources, you can attach the resource URI behind the resource modification time, SVN version number, file MD5 and other information, so that users can avoid downloading to the cached old files

④ observation of the performance of the chrome found that through the link or Address bar access, will first determine whether the cache expires, and then judge whether the slow resources are updated; F5 refresh, the cache expiration judgment is skipped and the server is directly requested to determine if the resource is updated.



Browser HTTP protocol caching mechanism detailed


1, the classification of the cache

The cache is divided into service-side (server side, such as Nginx, Apache) and client side (for example, Web browser).

Server-side caching is divided into proxy cache and reverse proxy Server cache (also known as Gateway caching, such as Nginx reverse proxy, squid, etc., in fact, widely used CDN is also a server-side caching, the purpose is to allow users to take the "shortcut", and are cached pictures, files and other static resources.

Client side caching generally refers to the browser cache, the purpose is to speed up the access to a variety of static resources, think of the current large-scale web site, any page is one hundred or two hundred requests each day PV are billion levels, if there is no cache, the user experience will be drastically reduced, while the server pressure and network bandwidth are facing serious test.


2, browser caching mechanism detailed

There are two kinds of browser cache control mechanisms: HTML meta tag vs. HTTP header information

2.1 HTML META Tag control cache

Browser caching mechanism, in fact, is mainly the HTTP protocol definition of caching mechanisms (such as: Expires Cache-control, etc.). But there are also caching mechanisms for non-HTTP protocol definitions, such as using HTML Meta tags, where web developers can add <meta> tags to the
<meta http-equiv= "Pragma" content= "No-cache" >

The purpose of the above code is to tell the browser that the current page is not cached and that each visit needs to be pulled by the server. It is simple to use, but only some browsers can support it, and none of the cache proxy servers are supported because the proxy does not resolve the HTML content itself. and the extensive application of HTTP header information to control caching, I will mainly introduce the HTTP protocol definition of caching mechanism.

2.2 HTTP header Information Control cache

2.2.1 Browser Request Process

The first time the browser requests a flowchart:


When the browser requests again:

2.2.2 Several important concepts to explain

Expires policy: Expires is the Web server response header field, which tells the browser that the browser can cache data directly from the browser before the expiration time, without having to request it again, in response to an HTTP request. However, expires is HTTP 1.0, the default browser now defaults to use HTTP 1.1, so its role is largely ignored. A disadvantage of Expires is that the return expiration time is the server-side time, and there is a problem, if the client's time differs greatly from the server's time (such as the clock is not synchronized, or across time zones), then the error is very large, so in the HTTP 1.1 version, Use cache-control:max-age= seconds instead.
Cache-control Policy (Focus): Cache-control is consistent with expires, indicating the validity of the current resource, controlling whether the browser caches data directly from the browser or sends a request to the server for data. But the Cache-control choice is more, the setting is more meticulous, if set at the same time, its priority is higher than expires.

Values can be public, private, No-cache, No-store, No-transform, Must-revalidate, Proxy-revalidate, Max-age
The instructions in each message have the following meanings:
Public indicates that the response can be cached by any buffer.
Private indicates that the entire or partial response message for a single user cannot be handled by the shared cache. This allows the server to simply describe a partial response message from the user, which is not valid for other users ' requests.
No-cache indicates that the request or response message cannot be cached, and this option does not mean that you can set "no cache", easy to words too literally ~
No-store is used to prevent important information from being inadvertently released. Sending in a request message will not use caching for both request and response messages. Frontline? Br/>max-age indicates that the client can receive a response that is not longer than the specified time in seconds.
Min-fresh indicates that the client can receive response times that are less than the current time plus a specified time.
Max-stale indicates that the client can receive response messages that exceed the timeout period. If you specify a value for the Max-stale message, the client can receive a response message that exceeds the specified value for the timeout period.

Last-modified/if-modified-since:last-modified/if-modified-since to cooperate with Cache-control use.

Last-modified: Indicates the last modification time of this response resource. When the Web server responds to the request, it tells the browser the last modification time of the resource.
If-modified-since: When a resource expires (using the Max-age identified by the Cache-control), the resource has a last-modified declaration, and then a request to the Web server with the top if-modified-since. Represents the request time. When a Web server receives a request and finds a header if-modified-since, it is compared to the last modification time of the requested resource. If the last modification time is newer, indicating that the resources have been changed, response to the entire resource content (written in the response message package), HTTP 200, if the last modification time is older, the resources have no new changes, the response to HTTP 304 (no need to package, save browsing), told the browser to continue to use the saved cache.

Etag/if-none-match:etag/if-none-match also need to cooperate with Cache-control use.

When the Etag:web server responds to the request, it tells the browser that the current resource is uniquely identified on the server (the build rule is determined by the server). In Apache, the value of ETag, by default, is obtained by hashing the file's index section (INode), size (size), and last modification time (mtime).
If-none-match: When a resource expires (using the Max-age identified by Cache-control), the resource is found to have a etage declaration, and the top If-none-match (etag value) is brought to the Web server again when requested. When a Web server receives a request, it discovers that a header if-none-match is matched against the corresponding checksum string of the requested resource, and decides to return 200 or 304.

The birth of last-modified he Shenghou ETag? You may think that using last-modified is enough to let the browser know if the local cache copy is new enough, why do you need ETag (entity ID)? The emergence of ETag in HTTP1.1 is mainly to solve several last-modified problems that are more difficult to solve:

The final modification of the last-modified annotation can only be accurate to the second level, and if some files are modified several times within 1 seconds, it will not be able to accurately mark the modification time of the file.
If some of the files are generated on a regular basis, and sometimes the content doesn't change, last-modified changes, causing the file to not use the cache
It is possible that the server did not accurately obtain the file modification time, or it is inconsistent with the proxy server time
ETag is the server-generated or generated by the developer of the corresponding resources on the server side of the unique identifier, can more accurately control the cache. Last-modified is used with ETag, the server prioritizes ETag.

Yahoo's YSlow law prompts caution to set ETag: It is important to note that the last-modified of files between multiple machines in a distributed system must be consistent to avoid load balancing to different machines causing failure, and Yahoo recommends that the distributed system be shut down as much as possible etag ( Each machine generates a different etag, because in addition to last-modified, the inode is also difficult to maintain consistent.
Pragma line is to compatible with HTTP1.0, the role and Cache-control:no-cache is the same.
Finally, the differences of the following status codes are summarized:


3, user behavior and caching

Browser caching behavior is also related to the user's behavior, if the Force refresh (Ctrl + F5) still have the impression that you should immediately understand my meaning ~

User Action

Expires/cache-control

Last-modified/etag

Address bar Carriage return

Effective

Effective

page link Jump

Effective

Effective

New Open Window

Effective

Effective

forward, back

Effective

Effective

F5 Refresh

Invalid (br reset max-age=0)

Effective

Ctrl+f5 Refresh

Invalid (reset Cc=no-cache)

Invalid (the request header discards this option


Please refer to the end refer [6]

4, refer:

[1] Browser caching mechanism

Http://www.cnblogs.com/skynet/archive/2012/11/28/2792503.html

[2] Web caching knowledge that Web developers need to know

Http://www.oschina.net/news/41397/web-cache-knowledge

[3] Browser Cache detailed Description: Expires,cache-control,last-modified,etag

http://blog.csdn.net/eroswang/article/details/8302191

[4] The difference between refreshing Web pages by carriage return, F5, and Ctrl+f5 in the browser address bar

http://cloudbbs.org/forum.php?mod=viewthread&tid=15790

http://blog.csdn.net/yui/article/details/6584401

[5] Cache control? ETag

https://blog.othree.net/log/2012/12/22/cache-control-and-etag/

[6] The story of the cache

http://segmentfault.com/blog/animabear/1190000000375344

[7] Google's Pagespeed website optimization theory mentions the use of ETag can reduce server burden

Https://developers.google.com/speed/docs/pss/AddEtags

[8] Yahoo's YSlow law prompts caution in setting ETag

Http://developer.yahoo.com/performance/rules.html#etags


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.