Caching mechanisms defined by non-HTTP protocols
Browser caching mechanism, in fact, is mainly the HTTP protocol definition of the caching mechanism (such as: Expires; Cache-control, etc.). But there are also non-HTTP protocol-defined caching mechanisms, such as the use of HTML Meta tags, web developers can be in the HTML page
<meta http-equiv= "Pragma" content= "No-cache" >
The purpose of the above code is to tell the browser that the current page is not cached, each access needs to go to the server pull. It is easy to use, but only some browsers can support it, and all cache proxies are not supported because the agent does not parse the HTML content itself. The following mainly describes the caching mechanism defined by the HTTP protocol .
Big Liar Browser Cache
Browser caching has always been a love and hate existence, on the one hand greatly enhance the user experience, on the other hand sometimes because read the cache and show "wrong" things, and in the development process to try to disable the cache. If you have not heard of the browser cache or do not know the usefulness of browser cache, you can first browse this article->web the role and type of caching.
So how does the browser caching mechanism work? The core is to save the content of the cache locally, instead of sending the same request to the server each time, imagine that each time the same page is opened, and the first time the download JS, CSS, pictures and other "save" in the local, and after the request every time in the local read, efficiency is not much higher? The real browser does not keep the whole content in the local, various browsers have different ways, such as Firefox is a similar innodb way to store the key value of the mode, in the address bar input About:cache can see the cached files, Chrome saves the cached files in a folder called user data. But if you read the cache every time there will be some problems, if the server side of the file update it? At this point, the server will be and the client contract a validity period, for example, the server tells the client 1 days my server file will not be updated, you can safely read the cache, so in this day every encounter the same request client will be able to read the cache files. However, if a day has passed, the client will read the file, and the service end of the agreed expiration date, and then send a request to the server, trying to download a new file, but it is possible that the server file is not actually updated, in fact, can read the cache. At this point how to determine the server file is not updated? There are two ways, the first one in the last time the server told the client to agree on the validity period, and tell the client when the last modification of the file, when again trying to download the file from the server, check if the file is not updated (compared to the last modified time), if not, read the cache The second way is the last time the server tells the client to agree on the validity period, while telling the client the version number of the file, when the server file updates, change the version number, again send the request check if the version number is consistent, such as consistent, you can read the cache directly.
In fact, the real browser caching mechanism is probably the case, the next one can be divided.
It is important to note that the browser responds to the server after the first request, and we can set these responses on the server to minimize or even not get resources from the server in future requests. The browser relies on the header information in the request and response to control the cache .
Expires and Cache-control
Expires and Cache-control are used by the server to contract and the client's effective time.
For example, in the previous response header, expires specifies the cache expiration time (date is the current time), while Cache-control's max-age specifies the cache validity time (2552s), theoretically the two values should be calculated with the same valid time (as if inconsistent). Expires is HTTP1.0, and Cache-control is HTTP1.1, which stipulates that if max-age and expires exist at the same time, the former has precedence over the latter . Cache-control parameters can be set to a number of values, for example (refer to the browser caching mechanism):
Last-modified/if-modified-since
And Last-modified/if-modified-since is what it says. If the check server file is updated after the validity period, the first way to use it with Cache-control. For example, the first time you visit my home page simplify the life, a jquery file is requested, and the response header returns the following information:
Then I press Ctrl+r refresh on the home page, because Ctrl+r will default to skip Max-age and expires the test directly to the server to send the request (below again explore how to read the cache after various refreshes), we look at the request:
The request header contains the If-modified-since key, and its value is consistent with the last-modified in the last request response header, and we found that this date was in the distant 2013, This means that the jquery file has not been modified since the date of 2013. Compare the date of the if-modified-since with the last modified date of the file on the server, if the same, the response HTTP304, read the data from the cache, and if the file is not updated, HTTP200, returns the data, Update the value of the last-modified with the response header (for the next comparison).
Etag/if-none-match
Etag/if-none-match is the second kind of check server file is the way to update, but also with Cache-control use. In fact, the ETag is not the version number of the file, but a string that can represent the unique character of the file (Apache, the value of the ETag, by default, the file's index section (INode), size and last modified time (MTime) hash after the resulting. ), when the client discovers and the server contracts the direct read cache time is over, sends the IF-NONE-MATCH option in the request, the value is the last request after the response header's ETag value, this value on the service side and the service side represents the file unique string contrast (if the server side of the file changes, the value will change) If the same, then the corresponding HTTP304, the client directly reads the cache, if not the same, HTTP200, downloads the correct data, updates the ETag value.
As seen above, the server agreed to directly read the local cache over time, will send a new request to the server, the request header with the If-none-match entry, the string value will be matched on the service side, it is clear that there is no change (see the ETag value of the response header), and then respond to HTTP304, Read the cache directly. Perhaps you will send this request also has if-modified-since item, if both exist simultaneously, If-none-match takes precedence, ignores if-modified-since. Maybe you'll ask why it's a priority. Both functions are similar or even identical, why should they exist at the same time? The appearance of the ETag in HTTP1.1 is mainly to solve several last-modified problems that are more difficult to solve:
- The last modification of the last-modified callout is only accurate to the second level , and if some files are modified several times within 1 seconds, It will not accurately label the file modification time
- If some files are generated on a regular basis, but sometimes the content does not change (just change the time), but last-modified changes, causing the file to not use the cache
- There may be situations where the server is not getting the file modification time accurately or inconsistent with the proxy server time.
Requests that cannot be cached
Of course, not all requests can be cached.
Requests that cannot be cached by the browser:
- HTTP message header contains Cache-control:no-cache,pragma:no-cache (HTTP1.0), or cache-control:max-age=0, etc. telling the browser not to cache the request
- Dynamic requests that determine input content based on cookies, authentication information, etc. cannot be cached
- HTTPS Security encryption Request (someone also tested found that IE in the head to add cache-control:max-age information, Firefox in the head after the addition of Cache-control:public, the ability to cache the HTTPS resources, Refer to the seven misconceptions of HTTPS)
- Post request cannot be cached
- Requests that do not contain last-modified/etag in the HTTP response header and do not contain cache-control/expires cannot be cached
User Behavior and caching
The browser caching process is also related to user behavior, such as those mentioned above, open my home simplify the life, there is a jquery request, if directly in the Address bar press ENTER, response to HTTP200 (from cache), because the validity period has not been directly read the cache If the ctrl+r is refreshed, the HTTP304 (not Modified) will be the same, although the local cache is read, but the request is made one more time, and if the ctrl+shift+r strong brush, the new file will be downloaded directly from the server in response to HTTP200.
We can see from the table above that when the user presses F5 to refresh, it ignores the Expires/cache-control setting, sends the request to the server request again, and Last-modified/etag is still valid, The server returns 304 or 200 depending on the situation, and when the user uses CTRL+F5 to force the flush, only all caching mechanisms are invalidated and the resource is pulled back from the server.
More reference to browser caching mechanism
Summarize
Theft diagram browser caching mechanism, two graphs are very clear
Reference
- Remember: Browser cache (from cache) and 304 summary
- Caching mechanism for the Web caching mechanism family 2–web Browser
- Browser caching mechanism-Wu Qin
- Browser caching mechanism
- On HTTP 1.1 Cache system
Analysis of browser caching mechanism