Viewing HTTP caching through the browser

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

As front-end developers, we do not seem to have much to do with our site or our application's caching mechanism, but these are parts of the performance that we are concerned about, and the site does not have any caching mechanisms, and our pages may become slow to download and render resources. But everyone knows to go to the front-end to solve the problem of slow page, and not to find the server side of the developers. Therefore, it seems necessary to understand the relevant caching mechanism and make full use of it.

There are many caching mechanisms on the web side, and I'm just learning and collating the browser-based HTTP caching mechanism to see how it works.

Article directory:

One, the type of Web cache
Second, why do I need a browser cache? What do we need to do?
Third, use the ETag to verify the cached HTTP response
Iv. What is Cache-control? How do I define a cache-control policy?
V. How are cached responses updated or deprecated?
Six, for the caching mechanism, what can be done now?
Vii. Expansion of Reading

One, the type of Web cache

1.1 Database Cache

We may have heard of memcached, which is a database-level caching scheme. Database caching means that when the relationship between Web applications is complex and the tables in a database are many, it is easy to overwhelm the database if queries are made frequently. In order to provide the performance of the query, put the query data into memory for caching, the next time the query, directly from the memory cache directly back, provide response efficiency.

1.2 CDN Cache

CDN caches are typically deployed by webmasters themselves, in order to make their sites easier to scale and achieve better performance. Typically, the browser initiates a Web request to the Cdn Gateway, which, after the gateway server, corresponds to one or more load-balanced source servers, dynamically forwards the request to the appropriate source server based on their load request. From a browser perspective, the entire CDN is a source server, and from this level, the caching mechanism between the browser and the server is equally applicable in this architecture.

1.3 Proxy Server Cache

The proxy server is the intermediary between the browser and the source server, and the browser initiates a Web request to the intermediary server, which is processed (such as permission validation, cache matching, etc.) before forwarding the request to the source server. Proxy server caching works like a browser, but it's bigger.

1.4 Browser Cache

Each browser implements the HTTP cache, and when we interact with the server through the browser using the HTTP protocol, the browser is cached based on a set of rules that are agreed with the server.

1.5 Application-tier caching

The application-tier cache refers to the caching we do at the code level. Through the code logic, the data or resources that have been requested, etc., are cached and once again require data to select the available cached data through logical processing.

Second, why do I need a browser cache? What do we need to do?

We know that through the HTTP protocol, it takes time to establish a connection between the client and the browser, while a large response requires multiple round-trips between the client and the server to get a full response, which delays the time that the browser can consume and process the content. This increases the cost of accessing the server's data and resources, so using the browser's caching mechanism to reuse previously acquired data becomes a performance optimization consideration.

So what's the suggestion? Of course.

specify an explicit caching policy for each resource to define whether the resource can be cached, who caches it, how long it can be cached, and how effectively it is re-validated when the cache time expires. when the server returns a response, it needs to provide the Cache-control and ETag in the response header .

The caching mechanism in the browser, in fact, is equivalent to the caching mechanism defined by the HTTP protocol, because the browser implements it for us. In general, we will think of Expires,cache-control,last-modified.if-modified-since,etag in the HTTP response header, such as cache-related response header information.

But here we say the server returns a response with the necessary cache-control and ETag. What is this for?

Because Cache-control is consistent with Expires 's role,last-modified and the role of the etag is similar. But they have the following differences:

The default browser is now using HTTP 1.1 by default , so the role of expires and last-modified can be ignored, with Cache-control and ETag.

Of course, the user's behavior will also affect the browser's cache, like this:

But let's take a look at how the server provides the Cache-control and etag response headers to how the cache works, regardless of the impact of the user's operation.

Third, use the ETag to verify the cached HTTP response

In general, the process of requesting a resource is probably the case:

I'm looking at Ajax to sort out some of the parameters of the request header and the response header of the HTTP request, and here's what the etag looks like.

The main role of the 3.1 etag

The server passes the verification code via the ETag HTTP header, which is probably a string like ' X123cef '. When the browser requests again after the resource expires, the browser passes the ETag verification code via If-none-match by default, and the verification code enables efficient resource update checks: If the resource has not changed, no data will be transferred.

The etag is primarily used to verify that a resource has been modified after the response expires.

How the 3.2 etag works

For example, the server in the first return to the time of the response to set the cache 120s, assuming that the browser after the 120s after the same resource request server, first, the browser will check the local cache and find the previous response, unfortunately, the response is now ' expired ', cannot be used. At this point, the browser can also make a new request directly to get a new full response, but this is less efficient because if the resource has not been changed, there is no reason to download the exact same bytes that are already in the cache.

So it's time for the etag to work, usually the server generates and returns a verification code in the ETag, often a hash of the contents of the file or some other fingerprint code. The client does not have to know how the fingerprint code is generated, just send it to the server in the next request (the browser adds it by default): If the fingerprint is still the same, the resource is not modified, the server will 304 not Modified, so we can skip the download, using the resources already cached, And the resource will continue to cache 120s. Just like this:

Iv. What is Cache-control? How do I define Cache-control?

The Cache-control response header in the response header of the server in response to a browser request allows each resource to define its own cache policy through the Cache-control HTTP header, which is used by the Cache-control directive to tell us what conditions the resource can cache. And how long it can be cached.

4.1 Meaning of the Cache-control header parameter (Cache-control in the response header)

1 no-cache : Indicates that the response returned must be confirmed with the server before it can be used to satisfy subsequent requests to the same URL. Therefore, if there is an appropriate authentication token (ETAG), the no-cache initiates round-trip traffic to validate the cached response and avoids the download if the resource has not been changed.  2 no-store : Disables caching of any responses, which means that each time a user requests a resource, a request is sent to the server, and the full response is downloaded each time. public: If the response is marked as public, the response can be cached even if there is an associated HTTP authentication and even the response status code is not cached properly.    Private: The browser can cache the private response, but it is usually cached only for a single user, so no proxy server is allowed to cache it.    For example, the user browser can cache HTML pages that contain user private information, but the CDN cannot cache.  5 max-age : The maximum  amount of time (in seconds) used to set the resource to be cached.

4.2 How to use Cache-control

Typically, we can pass the process to set the appropriate response header to the Cache-control header.

V. How are cached responses updated or deprecated?

In general, all HTTP requests made by the browser are routed first to the browser's cache to see if a valid response that can be used to implement the request is cached. If there is a matching response, the response is read directly from the cache, which avoids network latency and the data costs incurred by the transmission. However, what if we want to update or discard the cached response?

Suppose we've told visitors that a CSS stylesheet caches for up to 24 hours (max-age=86400), but the designer has just submitted an update that we want all users to be able to use. How do we notify all visitors that the cached CSS copy is obsolete and needs to be updated?

A new user who has not previously requested the resource will get an updated resource, but the user who has requested the resource will always get the old cached resource until the expiration time is reached until he manually cleans up the browser's cache. Manually clean up your browser cache this can only be done by programmers, so what can we do to get the updated resources?

In fact, we can change the URL of the resource after the content of the resource changes, forcing the user to download the new response. For example, add parameters after a resource link:

Six, for the caching mechanism, what can be done now?

I found a caching checklist when I browsed the material, and we can follow the recommendations to make reasonable use of the caching mechanism:

1 Use consistent URLs: if you provide the same content on different URLs, you will get and store the same content multiple times. Tip: URLs are case-sensitive!
2 ensure that the server provides a verification code (ETAG): Through a verification code, you do not have to transfer the same bytes if the resources on the server have not been changed.
3 determine which resources the proxy cache can cache: A resource that is exactly the same as the response to all users is appropriate for caching by a CDN or other proxy cache.
4 Determine the optimal cache cycle for each resource: different resources may have different update requirements. Review and determine the appropriate max-age for each resource.
5 Determine the best cache level for your site:You can control the speed at which clients get updates by using a resource URL that contains content fingerprinting and a short time or no-cache lifetime for the HTML document.
6 Change minimization: Some resources are updated more frequently than other resources. If a specific part of a resource, such as a JavaScript function or a set of CSS styles, is updated frequently, consider providing its code as a separate file. This way, each time an update is obtained, the remainder, such as library code that is not updated frequently, can be obtained from the cache, ensuring that the content is downloaded in the least amount.

Vii. Expansion of Reading

[Web caching mechanism series]

[Google Developer Browser Caching]

[HTTP Caching]

[Caching Tutorial]

[HTTP Caching FAQ MDN]

[Browser caching mechanism]

Viewing HTTP caching through the browser

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More