Web cache (i)-HTTP protocol cache

Source: Internet
Author: User
Tags http 200 local time browser cache sessionstorage

Why Use WEB Caching

Web cache is generally divided into browser cache, proxy server cache and gateway cache, this article is mainly about the browser cache , the other two kinds of cache everyone to understand the next.

Web caches roam between the server and the client. This server may be the source server (the server where the resource resides), the number may be 1 or more, and the client may be 1 or more. The WEB cache monitors the server-client, monitors requests, and saves the contents of the request output (for example, HTML pages, pictures, and files) (collectively, copies), and then, if the next request is the same URL, requests the saved copy directly instead of bothering the source server again.

2 main reasons to use caching:

    • Reduced latency: The cache is closer to the client, so the content from the cache requests less time than it takes from the source server, renders faster, and the site becomes more responsive.
    • Reduced network transmission: Replicas are reused, greatly reducing the user's bandwidth usage, but also a disguised savings (if the traffic to pay), while ensuring that bandwidth requests at a low level, easier to maintain.

Imagine the current large site, any page is one hundred or two hundred requests, every day PV is billion levels, if there is no cache, the user experience will drop sharply (in the time of waiting for requests), while the server pressure and network bandwidth are facing severe test.

Browser cache control mechanism

There are three browser cache control mechanisms: HTML5 offline storage and local cache, HTML Meta tags, HTTP protocol caching.

HTML5 offline storage and local cache

The caching mechanism is to use HTML5 to launch some APIs that support offline applications for data caching, such as AppCache, Sessionstorage, Localstorage, and so on.

AppCache lists the resources to download and cache by defining a profile (manifest file), the manifest file example is as follows:

CACHEMANIFEST#Commentfile.jsfile.css

Then refer to it in HTML:

 manifest="./xxx.manifest">

The basic usage of sessionstorage and Localstorage is as follows:

// localStorage 用法相似sessionStorage.set(‘name‘,‘laixiangran‘// 存储数据sessionStorage.get(‘name‘// 获取数据 ‘laixiangran‘

This article is not described in detail for the time being, I will introduce the content of this piece separately later.

HTML Meta Tags

Using HTML Meta tags, WEB developers can add tags to the nodes of an HTML page, with the <meta> following code:

<META HTTP-EQUIV="Pragma" CONTENT="no-cache">

The purpose of the above code is to tell the browser that the current page is not cached, each access needs to go to the server pull.

It is easy to use, but only some browsers can support it, and all cache proxies are not supported because the agent does not parse the HTML content itself.

HTTP Protocol Cache

HTTP protocol caching is the focus of this article, which controls caching via HTTP header information, which allows you more control over how your browser and proxy server handles your copy. They are invisible in the HTML code and are typically generated automatically by the WEB server. However, depending on the server you are using, you can control it in some way.

Browser request Process

The first time the browser requests a flowchart:

The process is relatively simple, the browser at the first request when the cache does not exist, directly from the browser request, and so on after the request returned results, based on the HTTP header information to cache the data in memory or hard disk.

When the browser requests again:

The process is much more complex, and the browser needs to determine whether the data is read from the cache directly or by the server, based on the HTTP header information.

The difference between several status codes:

Here we explain HTTP header information in the HTTP protocol cache from the HTTP status code (from cache) and 304来 that appear in the process.

(from Cache)

This HTTP status code means that the server is not accessed, and the data is read directly from the cache (memory or hard disk).

Look at two pictures:

From the above two graphs, we will see that the status code is a little different, and that the 200(from memory cache) 200(from diks cache) difference between the two is to read from memory, one is to read from the hard disk, and then their order is first read from memory, and then read from the hard disk. Here we are collectively referred to as 200(from cache) .

200(from cache)in this case, we need to focus on Expires Cache-control the two HTTP header information fields.

Expires

The Chinese meaning of Expires is "validity". Obviously, it tells the browser that the cache is valid. If it expires, the cache checks the source server to determine if the file has changed.

The only valid value for the Expires header is the HTTP time, the other value is not valid and will not be cached. Note: Time is GMT (GMT), not local time. As shown below:

Expires:Mon,29Oct201803:53:10GMT

So look at the Expires in the two graphs above, it expires by 2018-10-29 03:53:10, and the date of our request is 2018-04-29 03:53:10, so this request reads the data directly from the cache, returns a Cache).

Although the Expires head is useful, it has some limitations:

    • Because of the time involved, the WEB server side of the time must be synchronized with the cache, or it is likely to not achieve the expected results-the cache will expire data as the latest data, the latest data as outdated data.
    • It's easy to forget to set a specific time for a content, and if the expiration time is not updated when the content is returned, each request is petitioning to the server, instead increasing the load and response time.
    • Finally, Expires is the HTTP 1.0 thing, now the default browser is using HTTP 1.1 by default, so its function is basically ignored.
Cache-control

Cache-control is consistent with Expires, which indicates the validity of the current resource, whether the browser reads the data directly from the browser cache or re-sends the request to the server to read the data. But Cache-control more choice, more detailed settings, if set at the same time, its priority is higher than Expires.

Cache-control useful response headers include:

    • max-age=[seconds]: indicates that the cache is fresh and does not need to be updated within this time frame. Similar to Expires time, but this time is relative, not absolute. That is, the cache is fresh for a few seconds after a successful request.
    • s-maxage=[seconds]: similar to Max-age, except that it applies only to shared caches such as proxies.
    • Public : The Token-authenticated response can be cached. In general, authentication of HTTP request content will be automatically privatized (not cached).
    • Privaten: Allow caching to store responses specifically for one user, such as in a browser; shared caches are generally not, for example, in proxies.
    • No-cache: Each time a cache copy is released before releasing the request to the source server for verification, which is useful to ensure authentication effectiveness (in conjunction with public) or to ensure that the content must be immediate, not ignore all the advantages of the cache, such as domestic micro-blog, A refresh display like Twitter.
    • No-store: There is no case for forcing a hold of any copy.
    • must-revalidate: tell the cache, I've got some information about freshness, and I'm going to stick to it in performance. HTTP allows caching to return outdated data in certain situations, specifying this property, and you must strictly follow my rules relative to telling the cache.
    • proxy-revalidate: similar to must-revalidate, except that it can only be applied to proxy caches.

Use the following as follows:

Cache-Control:max-age=15811200

So look at the Cache-control in the two graphs above, which is valid for 15.8112 million seconds after the current request succeeds, so this request reads the data directly from the cache and returns a range of (from cache). If you start successfully from the current request, the new data will be re-requested from the server after 15.8112 million seconds.

304

When the browser passes Expires or Cache-control determines that the cache has expired, it is necessary to resend the request to the server and let the server determine whether the current cache can continue to be used.

When the server determines that the cache is invalidated, the new data is returned and the HTTP status code is 200;

When the browser determines that the cache is not invalidated, it will return an HTTP status code of 304 (without the package body, saving traffic), telling the browser to continue using the cache.

So what HTTP header information fields are passed to determine whether to return 200 or 304? Then we'll take the next leading role: Last-Modified/If-Modified-Since and Etag/If-None-Match . Both of these fields need to be Cache-Control used together.

Last-modified/if-modified-since
    • last-modified: indicates the last modification time for this response resource. When the Web server responds to a request, it tells the browser the last modification time of the resource.

    • if-modified-since: When the resource expires (using Cache-Control the identity max-age ) and the discovery resource has a Last-Modified claim, the request time is taken with if-modified-since when requested again to the Web server. When the Web server receives the request, it finds that if-modified-since is compared to the last modification time of the requested resource. If the last modification time is newer, indicating that the resource has been modified, the response resource content (written in the response message packet), HTTP 200, if the last modification time is older, the resource has no new modifications, the response to HTTP 304 (without the package body, save traffic), tell the browser to continue to use the cache.

Etag/if-none-match

This is the introduction of a new validator in HTTP 1.1.

    • Etag: When the Web server responds to a request, it tells the browser that the current resource is uniquely identified on the server (the build rule is determined by the server). In Apache, the value of the ETag, by default, is obtained by hashing the file's index section (INode), size, and last modified time (MTime).

    • If-none-match: When the resource expires (using Cache-Control the identity max-age ) and the discovery resource has a Etage declaration, the If-none-match (the Etag value) is requested again to the Web server. When the Web server receives the request, it finds that If-none-match is compared to the corresponding check string of the requested resource, and decides to return 200 or 304.

Etag takes precedence over last-modified

You might think that using last-modified is enough to let the browser know if the local cache copy is new enough, why do you need an ETAG (entity identity)? The appearance of the Etag in HTTP1.1 is mainly to solve several last-modified problems that are more difficult to solve:

    • The last modification of the last-modified callout is only accurate to the second level, and if some files are modified several times within 1 seconds, it will not accurately label the file's modification time.

    • If some files are generated on a regular basis, sometimes the content does not change, but last-modified changes, causing the file to be unable to use the cache.

    • There may be situations where the server is not getting the file modification time accurately or inconsistent with the proxy server time.

An Etag is a unique identifier on the server side of a server that is automatically generated or generated by the developer, allowing more accurate control of the cache. last-modified and ETag can be used together, the server will first verify the etag, consistent with the case, will continue to compare to last-modified, and finally decide whether to return 304.

Tips for creating a cache-enabled Web site

Through the above introduction, we know the HTTP protocol caching mechanism, the purpose is to allow you to more flexible and more granular control of the browser cache, so that your site's cache more friendly, more perfect user experience.

The following tips can also make your site's cache more friendly:

    • Keep URLs stable: This is the golden rule of caching, and if you provide the same content for different pages, different users, or different sites, they should use the same URL. This is a simple but very effective method. For example, if you have a reference address in your HTML that is "/index.html", use this address all the time.
    • Pictures and other elements in different places use the same library .
    • Enable caching for images/pages that do not change frequently by setting the value of the cache-control:max-age header information a bit larger.
    • for content that is updated periodically implements the cache by specifying max-age or expiration time.
    • Change the name if the resource changes (especially download the file). Because this resource typically has a long expiration time and the correct version is always on the server, the page that links this download resource needs to have a shorter expiration time. Otherwise, the server's resources are new, but the page is cached and the link address is old, and the new and old version of the conflict may occur.
    • Do not change the file: Otherwise you will have to set a new last-modified value. In addition, when you update the site, just upload the changed files, not the entire site is covered in the past.
    • Cookies do not have to be used: cookies are difficult to cache and are not necessary in most situations. If you have to use cookies, it's recommended to use them on dynamic pages.
    • Reduce the use of SSL: because the shared cache cannot store authentication pages, use them only when necessary, and reduce the use of images on SSL pages.

SSL: Secure Socket layer– Security Sockets Layer, developed for the Netscape, to ensure the security of data transmission on the Internet, the use of data encryption (encryption) technology, ensures that the data in the network transmission process will not be intercepted and eavesdropping. The current general-purpose specification is the safety standard for a-bit, while the US has introduced a higher safety standard of-bit, but has restricted its exit. SSL can be supported as long as the i.e or Netscape browser is above version 3.0.

    • use Redbot to check your site: can help you apply some of the concepts described in this article.

Redbot:redbot = RED + Robot, is a robot, check HTTP resources, see how they will behave, point out common problems, and make suggestions for improvement. Although it belongs to the HTTP conformance tester, it can find many HTTP related issues.

User Behavior and caching

Some of the user's behavior can affect the browser's cache, as follows:

Complete flowchart

Web cache (i)-HTTP protocol cache

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.