Browser HTTP protocol caching mechanism

Source: Internet
Author: User
Tags http 200 browser cache nginx reverse proxy

    • 1, the classification of the cache
    • 2. Detailed browser caching mechanism
    • 2.1 HTML META Tag control cache
    • 2.2 HTTP header Information Control cache
    • 2.2.1 Browser Request Process
    • 2.2.2 Several important concepts explained
    • 3. User behavior and caching
    • 4, Refer:

Https://www.cnblogs.com/520yang/articles/4807408.html

Recently in the preparation to optimize the log request encountered some puzzling problems, such as why the response head appeared in two cache control, why clearly set the no cache is still a request, why many visits sometimes request with an etag, and sometimes did not bring? Wait a minute...

After checking some information and colleagues personally to verify, finally have a clear understanding of these issues, and now put it in order to forget.

1, the classification of the cache

The cache is divided into service side (server side, such as Nginx, Apache) and client side (side, such as Web browser).

Server cache is also divided into proxy servers cache and reverse proxy Server cache (also known as Gateway cache, such as nginx reverse proxy, squid, etc.), in fact, widely used CDN is also a service-side cache, the purpose is to let users request "shortcut", and are cached pictures, files and other static resources.

Client side cache generally refers to the browser cache, the purpose is to speed up the access of a variety of static resources, think of the current large site, any page is one hundred or two hundred requests, PV is billion every day, if there is no cache, the user experience will drop sharply, at the same time the server pressure and network bandwidth are facing serious test.

2. Detailed browser caching mechanism

There are two kinds of browser cache control mechanisms: HTML meta tag vs. HTTP header information

2.1 HTML META Tag control cache

Browser caching mechanism, in fact, is mainly the HTTP protocol definition of the caching mechanism (such as: Expires; Cache-control, etc.). But there are also non-HTTP protocol-defined caching mechanisms, such as the use of HTML Meta tags, web developers can be in the HTML page

1 <METAHTTP-EQUIV="Pragma"CONTENT="no-cache">

The purpose of the above code is to tell the browser that the current page is not cached, each access needs to go to the server pull. It is easy to use, but only some browsers can support it, and all cache proxies are not supported because the agent does not parse the HTML content itself. The most widely used HTTP header information to control the cache, the following I mainly describe the HTTP protocol definition of the caching mechanism.

2.2 HTTP header information Control cache 2.2.1 Browser Request flow
    • The first time the browser requests a flowchart:

    • When the browser requests again:

2.2.2 Several important concepts explained
    • Expires policy: Expires is a Web server response message header field that, in response to an HTTP request, tells the browser to cache data directly from the browser before the expiration time, without having to request it again. However, expires is the HTTP 1.0 thing, now the default browser is using HTTP 1.1 by default, so its function is basically ignored. One drawback of Expires is that the return expiration time is the server-side time, there is a problem, if the client's time and the server time difference is very large (such as clock out of sync, or cross-time zone), then the error is very large, so in HTTP version 1.1, Use cache-control:max-age= seconds instead.
    • Cache-control strategy (FOCUS): Cache-control is consistent with expires, indicating the validity of the current resource, controlling whether the browser caches data directly from the browser or re-sends the request to the server. But Cache-control more choice, more detailed settings, if set at the same time, its priority is higher than expires.

123456789 public indicates that the response can be cached by any buffer. The private indicates that the entire or partial response message for a single user cannot be shared by the cache processing. This allows the server to simply describe a partial response message for the user, and this response message is not valid for another user's request. no-cache indicates that a request or response message cannot be cached, which is not to say "do not cache", easy to words too literally ~ max-age indicates that the client can receive a response that is not longer than the specified time (in seconds). The min-fresh indicates that the client can receive a response that is less than the current time plus a specified time. The max-stale indicates that the client can receive a response message that exceeds the timeout period. If you specify a value for the Max-stale message, the client can receive a response message that exceeds the specified value for the timeout period.
    • Last-modified/if-modified-since:last-modified/if-modified-since to be used with Cache-control.

12 Last-Modified:标示这个响应资源的最后修改时间。web服务器在响应请求时,告诉浏览器资源的最后修改时间。If-Modified-Since:当资源过期时(使用Cache-Control标识的max-age),发现资源具有Last-Modified声明,则再次向web服务器请求时带上头 If-Modified-Since,表示请求时间。web服务器收到请求后发现有头If-Modified-Since 则与被请求资源的最后修改时间进行比对。若最后修改时间较新,说明资源又被改动过,则响应整片资源内容(写在响应消息包体内),HTTP 200;若最后修改时间较旧,说明资源无新修改,则响应HTTP 304 (无需包体,节省浏览),告知浏览器继续使用所保存的cache。
    • Etag/if-none-match:etag/if-none-match should also be used in conjunction with Cache-control.

12 Etag:web服务器响应请求时,告诉浏览器当前资源在服务器的唯一标识(生成规则由服务器决定)。Apache中,ETag的值,默认是对文件的索引节(INode),大小(Size)和最后修改时间(MTime)进行Hash后得到的。If-None-Match:当资源过期时(使用Cache-Control标识的max-age),发现资源具有Etage声明,则再次向web服务器请求时带上头If-None-Match (Etag的值)。web服务器收到请求后发现有头If-None-Match 则与被请求资源的相应校验串进行比对,决定返回200或304。
    • Last-modified Mr He etag? You might think that using last-modified is enough to let the browser know if the local cache copy is new enough, why do you need an etag (entity identity)? The appearance of the ETag in HTTP1.1 is mainly to solve several last-modified problems that are more difficult to solve:

123 Last-Modified标注的最后修改只能精确到秒级,如果某些文件在1秒钟以内,被修改多次的话,它将不能准确标注文件的修改时间如果某些文件会被定期生成,当有时内容并没有任何变化,但Last-Modified却改变了,导致文件没法使用缓存有可能存在服务器没有准确获取文件修改时间,或者与代理服务器时间不一致等情形
An etag is a unique identifier on the server side of a server that is automatically generated or generated by the developer, allowing more accurate control of the cache. When Last-modified is used with the ETag, the server prioritizes the ETag.
    • Yahoo's YSlow law hints at setting the ETag carefully: it is important to note that the last-modified of multiple machines in a distributed system must be consistent to avoid load balancing to different machines, and Yahoo recommends that the distributed system shut down the etag as much as possible ( The etag generated by each machine will be different because it is difficult to maintain consistency in addition to last-modified and Inode.
    • Pragma line is to be compatible with HTTP1.0, the role and Cache-control:no-cache are the same.
    • Finally, we summarize the difference of several status codes:

3. User behavior and caching

Browser caching behavior is also related to the behavior of the user, if you are forced to refresh (Ctrl + F5) still have an impression of the words should be able to immediately understand my meaning ~

User actions

Expires/cache-control

Last-modified/etag

Address Bar Enter

Effective

Effective

Page link Jump

Effective

Effective

New open Window

Effective

Effective

Forward and backward

Effective

Effective

F5 Refresh

Invalid (br reset max-age=0)

Effective

Ctrl+f5 Refresh

Invalid (reset Cc=no-cache)

Invalid (Request header discards this option)

For details, please refer to the end of Refer [6]4, Refer:

[1] Browser caching mechanism

Http://www.cnblogs.com/skynet/archive/2012/11/28/2792503.html

[2] Web cache Knowledge Web developer needs to know

Http://www.oschina.net/news/41397/web-cache-knowledge

[3] Browser cache Details: Expires,cache-control,last-modified,etag detailed description

http://blog.csdn.net/eroswang/article/details/8302191

[4] In the browser address bar press ENTER, F5, ctrl+f5 refresh the difference between the page

http://cloudbbs.org/forum.php?mod=viewthread&tid=15790

http://blog.csdn.net/yui/article/details/6584401

[5] Cache Control and ETag

https://blog.othree.net/log/2012/12/22/cache-control-and-etag/

[6] The cached story

http://segmentfault.com/blog/animabear/1190000000375344

[7] Google's Pagespeed website optimization theory mentions the use of ETag to reduce server burden

Https://developers.google.com/speed/docs/pss/AddEtags

[8] Yahoo's YSlow rule hints at setting the ETag carefully

Http://developer.yahoo.com/performance/rules.html#etags

Detailed browser HTTP protocol caching mechanism

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.