Thoroughly understand HTTP caching mechanism--three-factor decomposition method based on caching strategy

Source: Internet
Author: User
Tags ranges browser cache

http://geek.csdn.net/news/detail/131318


Introductory

As an important method of Web performance optimization, HTTP caching mechanism for the Web development of the small partners are required to grasp the knowledge, but recently I encountered a few cache header settings related to the topic, found that there are several wrong answers, and some even know the correct answer after still do not understand the reason, can be quite depressing it. In order to confirm whether it is only their own understanding is not deep, I deliberately consulted several other small partners, found that the situation is more or less similar to me.

In order not to give everyone suspense, I post 2 questions, you can try to answer the following:

The following is the page.html content:

<! DOCTYPE html>

The first time you visit this page, the Head.png response header information in the page is as follows:

http/1.1 OK
cache-control:no-cache
content-type:image/png
last-modified:tue, at Nov 2016 06:59:00 GMT
accept-ranges:bytes
Date:thu, Nov 2016 02:48:50 GMT
content-length:3534

Question 1: When you click the "Re-access page" link to reload the page, head.png how to load two times.

Question 2: If the Cache-control in the above information is set to private, what will the result be?

The above 2 questions, if you can all correct (haha, also please carefully confirm the why, in case of Fluke), so congratulations, you have a thorough understanding of these knowledge, I said later on the content you can ignore, otherwise please continue to accompany me down Lao Lao.

The first thing to go back to is the question of how many small partners (including me) are tumbling over the HTTP caching problem. I think the root cause of this phenomenon is that we have not absorbed the knowledge of the system, usually we learn this knowledge as a point of knowledge to remember, what this cache head for what, The cache head for what to use, but the actual cache head is often a number of cooperation between the work, a complete set of work system.

Today I will follow my own understanding of how HTTP cache headers work together from a system perspective (not the right place to correct, but please don't spray me OH): HTTP Caching System

First I divide the HTTP caching system into the following three sections:

1. Caching Storage Policies

Used to determine whether HTTP response content can be cached by the client and which clients can be cached.

This policy has only one function to determine whether HTTP response content can be cached to the client

For public, Private, No-cache, Max-age, and No-store in the Cache-control head, they are used to indicate whether the response content can be stored by the client, with the top 4 caching the file data (about No-cache should be understood as "Local caching is not recommended", which still caches data to local), while No-store does not cache any response data on the client. Another about No-cache and Max-age is a little bit special, I think it's a hybrid, I'll talk about it below.

The Cache-control:public setting allows us to store HTTP response data locally, but this does not mean that subsequent browsers will read the data directly from the cache and use it. Because it cannot determine whether the locally cached data is available (which may have expired), it must also be validated by a set of authentication mechanisms, which is the "cache expiration policy" below. 2. Cache Expiration Policy

The client is used to confirm whether the cached data stored locally has expired, and then decide whether to send a request to the server to get the data

This strategy has only one function, that is, to determine whether the client can directly load data from the local cache data and show (otherwise send a request to the server to obtain)

Just above we have explained that the data cache to the local still need to be judged to use, then the browser by what conditions to judge it. The answer is: Expires,expires named the absolute time that the cached data is valid, telling the client that the local cache is invalidated after this point in time (as compared to the client point), where the client can assume that the cached data is valid and can be loaded directly from the cache.

The HTTP cache header design, however, is not as disciplined as the No-cache and max-age of the aforementioned Cache-control (the header is in HTTP1.1), and they contain both a cache storage policy and a cache expiration policy to Max-age, for example, actually corresponds to:

Cache-control:public/private (This is not very sure exactly which)
Expires: Current client time + maxage.     

and Cache-control:no-cache and cache-control:max-age=0 (in seconds) are quite

It should be noted that the cache expiration policy specified in Cache-control is higher than Expires, and when they exist, the latter is overwritten. Cached data is marked as expired just tell the client can no longer read the cache directly from the local, need to send a request to the server to confirm, not the same as the local cache data is useless, in some cases even if expired or will be used again, specifically below will be mentioned. 3. Cache contrast Strategy

The data identity cached on the client is sent to the server, which determines whether the client-side cached data is still valid by identity, and then decides if the data is to be sent again.

When the client detects the data expiration or browser refresh, it often restarts an HTTP request to the server, the server is not in a hurry to return the data, but to see if the request has a logo (if-modified-since, if-none-match) over, if the identification is still valid , the return 304 tells the client to take the local cached data (note that you must output the corresponding header information (last-modified, ETags) to the client at the first response). At this point we understand that the local cache data mentioned above, even if it is considered to be out-of-date, does not mean that the data is useless.

With regard to last-modified, this response header is used to note that the cache expiration policy may be affected, for specific reasons, which I will explain later by answering the 2 questions mentioned at the beginning.

These are the caching strategies I know, and I'll combine the three elements of caching strategy with a few commonly used cache headers to give you a clearer understanding of the relationship between them:

I can see clearly from the diagram above that each cache item belongs to which cache policy category, there is a partial overlap, which indicates that these cache entries have multiple caching policies, so in the actual analysis of the cache header, in addition to the regular headers, we need to separate the items with the dual caching policy.

Finally we go back to the 2 topics we started with, and we'll break it down together:

First question:

http/1.1 OK
cache-control:no-cache
content-type:image/png
last-modified:tue, at Nov 2016 06:59:00 GMT
accept-ranges:bytes
Date:thu, Nov 2016 02:48:50 GMT
content-length:3534

Analysis of the above HTTP response headers found that the following two items are related to caching:

Cache-control:no-cache 
Last-modified:tue, Nov 2016 06:59:00 GMT

We've talked about Cache-control:no-cache equivalent to cache-control:max-age=0, and they're all multiple policy headers, and we need to break it down:

Cache-control:no-cache equals cache-control:max-age=0,
Then the cache-control:max-age=0 can be decomposed into:

Cache-control:public/private (which is not sure which one)
Expires: Current time

Finally, we have the following three elements of the complete caching strategy:

So the end result is that the browser will request the server again, and carry the last-modified specified time to compare the servers: The server returns 200 and sends the data, the client receives the data, displays it, and refreshes the local cache. Comparison success: The server returns 304 and does not send data again, and the client reads the cached data locally after receiving a 304 status code. The following is the case for simulating a request after this scenario:

The problem itself is not difficult, but if you think that No-cache will not cache data to the local, then you will be very contradictory to understand, because if the file data is not cached locally, the server will not be able to display the image content after 304, but in fact it can be displayed normally. This question is a good proof that No-cache also caches the data to the local version.

Second question:

http/1.1 OK
cache-control:private
content-type:image/png
last-modified:tue, Nov (2016 06:59:00) GMT< C3/>accept-ranges:bytes
Date:thu, Nov 2016 02:48:50 GMT
content-length:3534

To solve the problem and the same, first find the cache-related items:

Cache-control:private     
Last-modified:tue, Nov 2016 06:59:00 GMT

At this point we will find that the cache expiration policy item is not found at all, and the answer will be the same as above. 1:30 will also not be able to analyze the answer, it can only be tested under the actual:

And then look at the Chrome browser grab bag:

As you can see, there is a cache expiration policy that is directly cached by the browser's subsequent requests (according to my cache expiration policy, if the browser loads the cached data directly from the local cache, indicating that it believes that the local cached data is valid, there must be some sort of cache expiration judgment condition). This question baffled me for a long time, until a chance I found the answer in the Caching tab of the Fiddler Response Information Panel:

Originally, the browser follows a heuristic cache expiration policy without providing any browser cache expiration policies:

According to the time difference between date and last-modified in the response header of 2 times, take 10% of its value as the cache time period.

Paste the description of the Caching panel, English good students can be accurate translation:

http/1.1 Cache-control header is present:private
HTTP last-modified Header is Present:tue, after Nov 2016 06:59:00 C1/>no explicit HTTP Cache Lifetime information was provided.
Heuristic expiration policies suggest defaulting to:10% of the delta between Last-modified and Date.
That ' s ' 05:15:02 ' so this response'll heuristically expire 2016/11/11 0:46:01.

Finally, we have the following three elements of the complete caching strategy:

Final Results

The browser caches the time difference between date and last-modified for a period of time, using the local cached data directly instead of requesting the server (except for mandatory requests), and after the cache expires, requests the service side again and carries the last-modified The specified time goes to the server and determines whether to load the cached data locally, based on the response status of the service side. Summary

HTTP cache settings are not complicated, but easy to despise, today this article combined with 2 questions, through analysis, anatomy of the relevant cache head, from a systematic perspective of the HTTP caching mechanism to do a more complete analysis: HTTP caching mechanism is actually HTTP Caching policy three elements (latitude) of the set of interactions, so in the analysis and setting of HTTP message cache header, as long as the ability to accurately decompose the three cache elements, we can very accurate to predict the cache settings can ultimately achieve the results.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.