Original address: HTTPS://DEVELOPERS.GOOGLE.COM/WEB/FUNDAMENTALS/PERFORMANCE/OPTIMIZING-CONTENT-EFFICIENCY/HTTP-CACHING?HL=ZH-CN
The ability to cache and reuse previously acquired resources is a key aspect of performance optimization.
Each browser comes with an HTTP cache implementation that only needs to ensure that each server response provides the correct HTTP header instructions to indicate when the browser can cache the response and how long it can be cached.
HTTP request headers that interact with the browser:
When the server returns a response, it also issues a set of HTTP headers that describe the content type, length, cache instruction, authentication token, and so on for the response. For example, in an interaction, the server returns a 1024-byte response that instructs the client to cache it for up to 120 seconds and provides a validation token ("X234DFF") that can be used to check if the resource has been modified after the response expires.
Validating cached responses with the ETag
- Server uses ETag HTTP header to pass authentication token
- Validation tokens enable efficient resource checking: No data is transferred when resources are not changed
Verify the role of the token:
Assuming that the resource was first fetched for 120 seconds, the browser initiates a new request for that resource. First, the browser checks the local cache and finds the previous response. Unfortunately, the response is now out of date and cannot be used by the browser. At this point, the browser can make a new request directly and get a new full response. However, this is less efficient because if the resource does not change, then it makes no sense to download the exact same information that is already in the cache!
This is exactly the issue that the validation token (specified in the ETAG header) is designed to address. the random token generated and returned by the server is usually a hash of the contents of the file or some other fingerprint. The client does not need to know how the fingerprint was generated, just send it to the server the next time it is requested. If the fingerprint is still the same, it means that the resource has not changed and you can skip the download . For example:
In the example above, the client automatically provides an ETAG token within the "If-none-match" HTTP request header. The server verifies the token against the current resource. If it does not change, the server returns a "304 not Modified" response, informing the browser that the response in the cache has not changed and can be used for another 120 seconds. Please note that you do not have to download the response again, which saves time and bandwidth.
As a network developer, how to use efficient re-authentication? The browser will do all the work for us: it automatically detects whether a validation token was specified before it appends the authentication token to the issued request, and it updates the cache timestamp as necessary, based on the response received from the server. The only thing we have to do is make sure the server provides the necessary ETag tokens. Check the server documentation for any configuration flags that are not necessary.
Tip: The HTML5 boilerplate project contains a sample configuration file for all the most popular servers, with detailed annotations for each configuration flag and settings. Locate your favorite server in the list, find the appropriate settings, and then copy/confirm that your server is configured with the recommended settings.
Cache-control
- Each resource can define its cache policy through the Cache-control HTTP header
- The Cache-control directive controls who can cache the response and how long it can be cached under what conditions.
From a performance optimization standpoint, the best request is a request that does not require communication with the server: All network latencies can be eliminated through a local copy of the response, and traffic charges for data transfer are avoided. To achieve this, the HTTP specification allows the server to return Cache-control directives that control how the browser and other intermediate caches cache individual responses and how long the cache is cached.
Note: The Cache-control header is defined in the http/1.1 specification, replacing the header (such as Expires) that was previously used to define the response cache policy. All modern browsers support Cache-control, so it's enough to use it.
Command:
"No-cache" and "No-store"
"No-cache" means that you must first confirm with the server whether the returned response has changed before you can use the response to satisfy subsequent requests for the same URL. Therefore, if there is an appropriate authentication token (ETAG), No-cache initiates a round-trip communication to validate the cached response, but avoids the download if the resource has not changed.
By contrast, "No-store" is much simpler. It directly prohibits the browser and all intermediate caches from storing any version of the return response, such as a response that contains personal privacy data or bank business data. Each time a user requests the asset, it sends a request to the server and downloads the full response.
"Public" and "private"
If the response is marked as "public", the response can be cached even if it has associated HTTP authentication, and even the response status code is usually not cached. In most cases, "public" is not required because explicit cache information (such as "Max-age") has indicated that the response is cacheable.
By contrast, browsers can cache "private" responses. However, these responses are usually cached only for a single user, so no intermediate caches are allowed to cache them. For example, a user's browser can cache HTML pages that contain user private information, but the CDN cannot cache.
"Max-age"
The instruction specifies the maximum time (in seconds) to allow the retrieved response to be reused, starting at the requested time. For example, "max-age=60" means that the response can be cached and reused in the next 60 seconds.
Define the best Cache-control strategy
Follow the decision tree above to determine the best cache policy for the specific resource or set of resources your app uses. Ideally, your goal should be to cache as many responses as possible on the client, cache as long as possible, and provide a validation token for each response to enable efficient re-validation.
Cache-control Instructions and instructions |
max-age=86400 |
The browser and any intermediate caches can cache the response (if it is a "public" response) for up to 1 days (60 seconds x 60 Minutes x 24 hours). |
Private, max-age=600 |
The client's browser can only cache the response for a maximum of 10 minutes (60 seconds x 10 minutes). |
No-store |
Cached responses are not allowed, and each request must be fully fetched. |
According to HTTP Archive, in the top 300,000 sites (according to Alexa rankings), almost half of all downloaded responses can be cached by the browser, which can massively reduce duplicate web browsing and access. Of course, this does not mean that your particular application has 50% of the resources to cache. Some sites have more than 90% of the resources that can be cached, while other sites may have many private or time-sensitive data that cannot be cached at all.
Audit Web pages, determine which resources can be cached, and ensure that they return the correct cache-control and ETAG headers.
Discard and update cached responses
- The local cached response will be used until the resource expires.
- You can force the client to update the response to the new version by embedding the file content thumbprint in the URL.
- For best performance, each app needs to define its own cache hierarchy.
All HTTP requests made by the browser are routed first to the browser cache to confirm that a valid response is cached that can be used to satisfy the request. If there is a matching response, the response is read from the cache, which avoids network latency and traffic costs incurred by the transfer.
However, what if you want to update or discard the cached response? For example, suppose you have told a visitor to cache a CSS stylesheet for up to 24 hours (max-age=86400), but the designer has just submitted an update that you want all users to be able to use. How do I notify all visitors who have now "obsolete" CSS cache copies to update their caches? You cannot do this without changing the URL of the resource.
After the browser caches the response, the cached version will be used until it expires (as determined by max-age or expires) or until it is removed from the cache for some other reason, such as a user clearing the browser cache. Therefore, when you build a Web page, different users may end up using different versions of the file, the user who just gets the resource will use the new version of the response, and the user who caches the earlier (but still valid) copy will use the old version of the response.
So, how can you get your cake and eat it both: client-side Caching and quick updates? You can change the URL of a resource when it changes, forcing the user to download a new response. Typically, you can do this by embedding the thumbprint or version number of the file in the filename-for example, a style. x234dff. css.
Because you can define a cache policy for each resource, you can define a cache hierarchy so that you can control not only the cache time of each response, but also the speed at which visitors can see the new version. To illustrate, let's analyze the above example together:
- HTML is marked as "No-cache", which means that the browser will always re-validate the document on each request and get the latest version when the content changes. In addition, within HTML tags, fingerprints are embedded in the URLs of CSS and JavaScript assets: if the contents of these files change, the HTML of the page changes, and a new copy of the HTML response is downloaded.
- Allows browsers and intermediate caches (such as CDN) to cache CSS and set the CSS to expire after 1 years. Note that you can safely use the 1 "forward expiration" because the file's thumbprint is embedded in the filename: the URL changes when the CSS is updated.
- JavaScript is also set to expire after 1 years, but is marked private, perhaps because it contains some user private data that the CDN should not cache.
- The image cache does not contain a version or a unique thumbprint and is set to expire after 1 days.
The ETag, Cache-control, and unique URLs can be combined to achieve a single swoop: A longer expiration time, control over where the response can be cached, and on-demand updates.
Cache Check List
There is no best caching policy. You need to define and configure the appropriate settings for each resource, as well as the overall cache hierarchy, based on the communication pattern, the type of data provided, and the application-specific data update requirements.
Here are some tips and tricks to keep in mind when developing a caching strategy:
- use consistent URLs : If you provide the same content on different URLs, you will get and store the content multiple times. Tip: Be aware that URLs are case sensitive.
- ensure that the server provides the authentication token (ETAG): With the authentication token, the same bytes are not required when the resources on the server have not changed.
- determine which resources the intermediate cache can cache : resources that respond exactly to all users are well-suited for caching by CDN and other intermediate caches.
- determine the best cache cycle for each resource : Different resources may have different update requirements. Review and determine the appropriate max-age for each resource.
- determine the cache hierarchy for the most appropriate site : You can control how quickly the client gets updates by using a resource URL that contains the content thumbprint and a short or no-cache period for the HTML document.
- Minimize churn : Some resources are updated more frequently than other resources. If a specific part of a resource, such as a JavaScript function or a CSS style set, is updated frequently, consider providing its code as a separate file. This way, each time an update is obtained, the remainder, such as content library code that changes infrequently, can be obtained from the cache, minimizing the content size of the download.
Front-end Learning notes--http cache