Get a thorough understanding of HTTP caching mechanisms--three-factor decomposition method based on cache strategy

Source: Internet
Author: User
Tags ranges browser cache

Lead

Http caching mechanism as a Web performance optimization of the important means for the small partners engaged in web development is necessary to master the knowledge, but recently I met a few cache header settings related topics, found that there are several questions answered wrong, and some even know the correct answer still do not understand its reasons, is quite depressed!! In order to confirm whether it is only their own understanding is not deep, I specifically consulted several other small partners, found that the situation is more or less similar to mine.

In order not to give everyone suspense, below I posted 2 questions, we can try to answer the following:

The following are page.html content:

<!DOCTYPE HTML><HTMLxmlns= "http://www.w3.org/1999/xhtml"><Head>    <Metahttp-equiv= "Content-type"content= "text/html; charset=utf-8" />    <title>Page pages</title></Head><Body>    <imgsrc= "Images/head.png" />    <ahref= "page.html">Revisit Page page</a></Body></HTML>

The first time you visit this page, the head.png response header information in the page is as follows:

http/1.1 Okcache-control:no-cachecontent-type:image/pnglast-modified:tue, 06:59:00 gmtaccept-ranges:b Ytesdate:thu, 02:48:50 gmtcontent-length:3534

    • question 1: When I click the "revisit Page page" link to reload the page, how does head.png load two times?

    • 2: What happens if the Cache-control in the above information is set to private?

Above 2 questions, if you can answer all right (haha, also please carefully confirm why, in case Fluke), then congratulate you, you have a very thorough understanding of these knowledge, I said after the content you can ignore, otherwise please continue to accompany me to the chatter!

First go back to the beginning of a lot of small partners (including me) in the answer to the Http cache problem, I think the root cause of this phenomenon is that we are not enough to absorb the knowledge of the system, usually we learn this knowledge as a knowledge point to remember, what this cache head for what, That cache header is used for what, but the actual cache header is often a number of co-ordination work together, a complete set of work system.

Today, I will understand how the Http cache header works together from a system-based perspective (incorrect place, please correct me, but please do not spray me):

HTTP Cache Architecture

First I will divide the Http caching system into the following three parts:

1. Cache Storage Policies

Used to determine whether the Http response content can be cached by the client and which clients can be cached

This strategy has only one function that determines whether the Http response content can be cached to the client

For public, Private, No-cache, Max-age, No-store in Cache-control, they are used to indicate whether the response content can be stored by the client, where the first 4 caches the file data (about No-cache should be understood as "Local cache is not recommended", which still caches data locally, and No-store does not cache any response data on the client. Also about No-cache and max-age a bit special, I think it's a hybrid, and I'll talk about it.

With the Cache-control:public setting we can store Http response data locally, but this does not mean that subsequent browsers will read the data directly from the cache and use it, why? Because it cannot determine whether the locally cached data is available (which may have been invalidated), it must also be confirmed by a set of authentication mechanisms, which is the "cache expiration policy" we will refer to below.

2. Cache Expiration Policy

The client is used to confirm that the cached data stored locally has expired, and then decides whether to send the request to the server to get the data

This strategy also has only one function, that is to determine whether the client can directly load data from the local cache data and display (or send a request to the server to obtain)

Just above we have explained that the data cache to the local after the need to be judged to use, then the browser through what criteria to judge it? The answer is: Expires,expires named the cache data valid absolute time, told the client at this point in time (against the client point in time) after the local cache is obsolete, at this point in time the client can assume that the cached data is valid, can be directly loaded from the cache display.

However, the Http cache header design is not as well as the rules, like the above mentioned Cache-control (this header is in Http1.1 Riga) in the head of No-cache and Max-age is a special case, they contain both cache storage policy and cache expiration policy to Max-age, for example, he actually corresponds to:

Cache-Control:public/private(这里不太确定具体哪个)Expires:当前客户端时间 + maxAge 。

and Cache-control:no-cache and cache-control:max-age=0 (units in seconds) are quite

It is important to note that:

    1. The cache expiration policy specified in Cache-control takes precedence over Expires, which is overwritten when they exist at the same time.

    2. Cached data is marked as expired just tell the client to no longer read the cache directly from the local, it is necessary to send a request to the server to confirm, not the same as the local cache data is useless, in some cases even if it expires or will be used again, specifically below.

3. Cache contrast Policy

The data ID of the client is sent to the server, and the server determines whether or not to re-send the data by identifying if the client cache data is still valid.

After the client detects that the data expires or the browser refreshes, it often restarts an HTTP request to the server, and the server does not rush back to the data at this time, but rather to see if the request header has an identity (if-modified-since, If-none-match), if the identity is still valid. , the return 304 tells the client to take the locally cached data to use (it is important to note that you have to output the appropriate header information (last-modified, etags) to the client on the first response). At this point we understand that the above-mentioned local cache data, even if it is considered out of date, does not mean that the data is useless since then.

With regard to last-modified, this response header is used to note that the cache expiration policy may be affected, for specific reasons, and I'll explain it later by answering the 2 questions mentioned in the opening.

These are the caching strategies I know, and I'm going to combine the three elements of the cache strategy with the usual number of cache headers (items) to get a clearer picture of how they relate to each other:

I can clearly see which cache entries belong to which caching policy category, there is some overlap, it shows that these cache entries have a multi-cache policy, so in fact, in the analysis of the cache header, in addition to the regular header, we also need to separate these two cache policy items.

Finally we go back to the first 2 topics mentioned, we come together to break down:

First question:

HTTP/1.1 200 OKCache-Control: no-cacheContent-Type: image/pngLast-Modified: Tue, 08 Nov 2016 06:59:00 GMTAccept-Ranges: bytesDate: Thu, 10 Nov 2016 02:48:50 GMTContent-Length: 3534

Parsing the above Http response headers found the following two items related to cache:

Cache-Control: no-cache Last-Modified: Tue, 08 Nov 2016 06:59:00 GMT

We've talked about Cache-control:no-cache equivalent to cache-control:max-age=0, and they're all multi-strategy heads, and we need to break it down:

Cache-control:no-cache equals cache-control:max-age=0,
Then the cache-control:max-age=0 can be decomposed into:

Cache-Control: public/private (不确定是二者中的哪一个)Expires: 当前时间

Finally, we get the following three elements of the complete cache strategy:

So the end result is: The browser will request the server again, and carry on the last-modified specified time to compare the servers:

    • A) comparison failed : The server returned 200 and re-sent the data, the client received the data after the display, and refreshed the local cache.

    • b) Successful comparison : The server returns 304 without re-sending the data, and the client receives a 304 status code to read the cached data locally. The following is a case of the capture after the request is simulated in this case:

The problem itself is not difficult, but if you think that No-cache will not cache the data locally, then you will understand the contradiction, because if the file data is not cached locally, the server returns 304 will not be able to display the picture content, but in fact it can be displayed normally. This problem is a good proof that No-cache will also cache data locally.

Second question:

HTTP/1.1 200 OKCache-Control: privateContent-Type: image/pngLast-Modified: Tue, 08 Nov 2016 06:59:00 GMTAccept-Ranges: bytesDate: Thu, 10 Nov 2016 02:48:50 GMTContent-Length: 3534

The way to solve the problem is the same as above, first find the cache related items:

Cache-Control: private     Last-Modified: Tue, 08 Nov 2016 06:59:00 GMT

At this point we will find that the cache expiration policy entry is not found at all, and will the answer be the same as above? 1:30 will not be able to analyze the answer, it can only be tested under the actual:

Take a look at the Chrome browser grab bag:

As you can see, the local cache is taken directly from the browser's subsequent requests, and there seems to be some kind of cache expiration policy (based on the theory of cache expiration policy above, if the browser loads the cached data directly from the local, it believes that the local cache data is valid, there must be some kind of cache expiration judgment condition). This problem baffled me for a long time, until an occasional chance I found the answer in the Caching tab in the Fiddler Response info panel:

Originally, the browser follows a heuristic cache expiration policy without providing any browser cache expiration policy:

Based on the time difference between date and last-modified in the response header, 10% of the value is taken as the cache time period.

Paste the description in the caching panel, English students can accurately translate the following:

HTTP/1.1 Cache-Control Header is present: privateHTTP Last-Modified Header is present: Tue, 08 Nov 2016 06:59:00 GMTNo explicit HTTP Cache Lifetime information was provided.Heuristic expiration policies suggest defaulting to: 10% of the delta between Last-Modified and Date.That‘s ‘05:15:02‘ so this response will heuristically expire 2016/11/11 0:46:01.

Finally, we get the following three elements of the complete cache strategy:

Final result

The browser caches a period based on the time difference between Date and last-modified, which will be used to cache the data locally instead of requesting the server (except for forced requests), and after the cache expires, it will request the service side again and carry the last-modified Specify the time to go to the server to compare and decide whether to load the cached data locally based on the response status of the service side.

Summarize

HTTP cache settings are not complex, but easy to be despised, today this article combined with 2 topics, through the analysis, anatomy of related cache head, from a systematic perspective of the HTTP caching mechanism to do a more complete analysis: HTTP caching mechanism is actually the HTTP cache policy three elements (latitude) interaction of the collection , so when parsing and setting up the Http message cache header, as long as we can accurately decompose the three elements of the cache, we can very accurately pre-set the cache settings to achieve the result.

Reprint: Https://mp.weixin.qq.com/s/qOMO0LIdA47j3RjhbCWUEQ

Thoroughly understand the HTTP caching mechanism--a three-factor decomposition method based on the cache strategy

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.