Caching cache and control for Web pages

Source: Internet
Author: User
Tags http authentication

What are cache caches?

The cache is between the client and the server, or between the server and the server. It decides whether to save a copy of the obtained resources, how to use the copy, and when to update the copy, which includes the HTML, pictures, files, etc. of the page.

There are 2 major reasons to use the cache:

    • reduce response latency and make pages appear faster : Because the cache is closer to the client than the source server, it takes less time to respond to customer requests directly from the cache than from the source server, making the site appear to respond faster;
    • Reduce network bandwidth consumption : When replicas are reused, the number of resources taken from the source server is reduced, thereby reducing bandwidth consumption.

How does the cache work?

All caches follow these basic rules

    1. Keep copy:
      • If the response header information tells the cache not to keep the cache;
      • If the request information is required authentication or security encryption;
      • If the validation parameters (ETag or last-modified header information) are not present in the response, the cache server considers that there is a lack of direct update information and that the content will be considered non-cacheable.
    2. Use copy: Only fresh enough to use. Fresh enough for the condition:
      • Contains the complete expiration time and the life Control head information, and the content is still within the freshness period;
      • The browser has used a cached copy and has checked the freshness of the content in a session;
      • Cached copies have been used by the cache proxy server in the near future, and the last update time of the content is before the last use period;
      • A new copy will be sent directly from the cache without sending a request to the source server;
    3. Update copy:
      • If the cached copy is already too old, the cache server will issue a request validation request to the source server to determine whether the current copy can continue to be used for service;

In summary: freshness and validation are the most important ways to determine if content is available.

Some of these rules are defined in the Protocol (HTTP protocols 1.0 and 1.1), and others are set by the cache administrator (the browser's user or the administrator of the proxy server);

How to control caching

The control of the cache is generally on the server side, with a few on the client. The following is an example of a client-side control:

Example 1: Control the browser with HTML meta tags on the client and ask the browser not to cache this page:

<meta http-equiv= "Cache-control" content= "No-cache, No-store, must-revalidate" >

<meta http-equiv= "Pragma" content= "No-cache" >

<meta http-equiv= "Expires" content= "0" >

The settings above are the least suitable for various major browsers. Recommended for use.

which

* Cache-control inside the No-cache is specifically for IE6, if you do not need to support IE6, omit it.

* Pragma inside the No-cache is HTTP1.0 specifically for the old clients, if you do not need to support HTTP1.0, omit it

* Expires is specifically for HTTP1.0 client and proxies, if you do not need to support HTTP1.0, omit it

If none of the above 3 items are supported, finally, the remaining: (this is not recommended because the user is actually using a very miscellaneous browser version)

<meta http-equiv= "Cache-control" content= "No-store, must-revalidate" >

HTML meta tags are written in the HTML file, use is very simple, but not high, because only a few browsers will follow this tag (those who really "read" HTML browser), no one cache proxy server can follow this rule (because they almost completely do not parse the HTML content in the document), sometimes added in the Web page: pragma:no-cache This meta tag, if you want to keep the page refreshed, this tag is completely unnecessary.


If your site is hosted in an ISP room, and the room may not give you permission to control HTTP header information (such as: Expires and Cache-control);

On the server side:


HTTP header information, typically generated automatically by the Web server. However, depending on the service you use, you can control it to some extent. They are invisible in the HTML code. The HTTP header information gives you more control over how your browser and proxy server handles your copy.

HTTP header information is sent before the HTML code, only by the browser and some intermediate caches can be seen, a typical HTTP1.1 protocol returns the header information looks like this:

http/1.1200 OK
Date:fri, OCT 1998 13:19:41 GMT
server:apache/1.3.3 (Unix)
cache-control:max-age=3600, Must-revalidate
Expires:fri, OCT 1998 14:19:41 GMT
Last-modified:mon, June 1998 02:28:12 GMT
ETag: "3E86-410-3596FBBC"
content-length:1040
Content-type:text/html

Cache-control (cache control) HTTP Head Letter Interest

HTTP 1.1 defines a set of header information properties: Cache-control response header information that allows publishers of the site to more fully control their content and to locate limits on expiration time. Useful Cache-control response header information includes:

    • Max-age=[sec-the maximum time to perform a cache is considered to be up to date. Similar to the expiration time, this parameter is based on the relative time interval of the request time, not the absolute expiration time, [seconds] is a number in seconds: the number of seconds from the start of the request to the expiration time.
    • S-maxage=[sec-similar to the Max-age property, except that he is applied to a shared (such as proxy server) cache
    • Public -Tag authentication content can also be cached, in general: After HTTP authentication to access the content, the output is automatically not cached;
    • No-cache -forces each request to be sent directly to the source server without verifying the local cached version. This is useful for verifying the authentication application (which can be used in conjunction with public) or for applications that use the latest data (all the benefits of using the cache at the expense of);
    • No-store -forcing the presence of any copy in any case
    • must-revalidate -tells the cache to follow all of the freshness you give to the replica, HTTP allows caching to return outdated data in certain specific cases, specifies this attribute, you cache, and you want to strictly follow your rules.
    • proxy-revalidate -similar to must-revalidate except that he only works on the cache proxy server
calibration parameters and calibration Inspection

A checksum is a mechanism of communication between the server and the cache when the copy has been modified, using this mechanism: The cache server avoids the fact that the copy is actually still new enough to repeatedly download the entire original.


* If there is no information indicating the freshness period (expires or Cache-control), the cache will not store any copies;

* When a cache contains last-modified information (last modified time), he based on this information, by adding a if-modified-since request parameter, to the server query: whether this copy has been modified since the last view.
HTTP 1.1 describes another check parameter: ETag, which is the unique identifier generated by the server, and the label for each copy changes. Because the server controls how the ETag is generated, the cache server can return unchanged through the If-none-match request and the current copy is exactly the same as the original.
* All cache servers use last-modified time to determine whether replicas are new enough, and etag validation is becoming more prevalent;
* All new generation Web servers automatically generate ETag and last-modified header information for static content such as files, and you don't have to make any settings.

* The server does not know how to generate this information for dynamic content (for example, cgi,asp or database-generated sites);

Tips for creating a cache site

In addition to using freshness information and validation, there are many ways you could make your site cache friendly.

    • Keep URL stability : This is the golden rule of caching, if you give the same content on different pages to different users or from different sites, you should use the same URL, which is the simplest and most efficient way to make your site cache friendly. For example: If you use "/index.html" as a reference on the page, you always use this address;
    • use a common library to store images and other page elements that are referenced on each page;
    • for images that don't change often / page enables caching and uses the Cache-control:max-age property to set a longer expiration time;
    • Sets the Max-age attribute or expiration time that is recognized by a cache server for content that is updated regularly ;
    • if the data source (especially the download file) changes, modify the name so that: you can make it a long period of time, and ensure the correct version of the service, while the link to the download file is a page that needs to set a shorter expiration time.
    • do not change the file , otherwise you will provide a very new last-modified date, for example: When you update the site, do not copy all the files of the entire site, only upload your modified files.
    • use only when necessary cookies, cookies are very difficult to cache, and in most cases are not necessary, if using cookies, control on dynamic Web pages;
    • Reduce trial SSL, encrypted pages are not cached by any shared cache server, are used only when necessary, and reduce the use of images on SSL pages;
    • using the cacheable evaluation engine is helpful for you to practice many of the concepts in this article.

Write script that facilitates caching

The script default does not return a checksum (return last-modified or etag header information) or other freshness information (expires or Cache-control), some dynamic scripts are indeed dynamic content (each time the corresponding content is different), but more (search engines, Database Engine Web site) can also benefit from cache friendliness.
In general, if the output generated by a script is repeatable over a period of time (minutes or days), it can be cached. If the output of the script changes only as the URL changes, it is cacheable, but it is not cacheable if the output changes according to the cookie, authentication information, or other external conditions.

    • The most favorable script for caching is to export content changes as static files, the Web server can be used as another Web page and generate and test the validation parameters, make some easier, just write to the file, so that the last modification time also has;
    • Another way to allow the script to be cached is to maintain a relatively new content for a period of time to set a relative life of the header information, although through the Expires header information can also be achieved, but it is easier to use the Cache-control:max-age property, It will keep the cache fresh for a period of time after the first request;
    • If you can't do this, you can have the script generate a checksum property and respond to if-modified-since and/or If-none-match requests, which can be obtained from parsing HTTP headers and returning 304 not for eligible content Modified (content unchanged), unfortunately, this practice compared to the first 2 kinds of efficient;

Other tips:

    • Try to avoid using post, unless it is a last resort, post mode return content will not be saved by most cache servers, if you send content through the URL and query (through the Get mode) content can be cached for later use;
    • Do not include identifying information for each user in the URL: Unless the content is different for each user;
    • Do not count all requests from one address by a user, because the cache is often working together;
    • Generate and return content-length header information, if convenient, this property allows your script in a sustainable link mode: The client can request multiple copies via a TCP/IP link at the same time, instead of establishing a link for each request, your site will be much faster;

MORE:

Http://www.chedong.com/tech/cache_docs.html

https://www.mnot.net/cache_docs/

Caching cache and control for Web pages

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.