Analysis of browser cache mechanism
Non-HTTP Cache Mechanism
The browser Cache mechanism is mainly the Cache mechanism defined by the HTTP protocol (such as Expires and Cache-control ). However, there is also a non-HTTP cache mechanism. For example, with the HTML Meta tag, Web developers can add the <meta> tag to the
<META HTTP-EQUIV="Pragma" CONTENT="no-cache">
The above code tells the browser that the current page is not cached and needs to be pulled from the server for each access. It is easy to use, but only some browsers support it, and none of the cache proxy servers support it, because the proxy does not parse the HTML content itself. The following describes the cache mechanism defined by HTTP.
Big talk browser cache
Browser cache has always been a loving and hateful existence. On the one hand, it greatly improves the user experience, and on the other hand, sometimes it displays "wrong" because it reads the cache, in the development process, I tried every means to disable the cache.
How does browser cache work? The core is to save the cached content locally, instead of sending the same request to the server every time. Imagine opening the same page every time, while opening the same page for the first time, the downloaded js, css, and images are saved locally, and subsequent requests are read locally. Is the efficiency much higher? When a real browser is working, it does not save the complete content locally. Various browsers have different methods. For example, firefox is a key value storage mode similar to innodb, enter "about: cache" in the address bar to view the cached files. chrome stores the cached files in a folder named "User Data. However, if you read the cache every time, there will also be some problems. What if the files on the server are updated? At this time, the server will agree a validity period with the client. For example, if the server tells the client that the files on the server will not be updated within one day, you can read the cache with confidence, therefore, every time the client encounters the same request on this day, it is happy to be able to read files in the cache. However, if one day has passed, the client will read the file again and find that the validity period agreed with by the server has passed. Therefore, the client will send a request to the server to download a new file, however, it is very likely that the files on the server are not updated, but they can still be read from the cache. How can I determine whether the files on the server are updated? There are two methods: the first one is to tell the client the last modification time of the file while the last server tells the client the agreed validity period. When trying to download the file from the server again, check whether the file has been updated (compare to the last modification time). If not, read the cache. The second method is to notify the client of the validity period at the same time, at the same time, tell the client the version number of the file. When the Server File is updated, change the version number. When sending the request again, check whether the version number is consistent, for example, consistent, you can directly read the cache.
As a matter of fact, the real browser cache mechanism is almost the same. Next we can sit on the right signs separately.
Note that the browser will get a response after the first request to the server. We can set these responses on the server, in this way, you can minimize or even not obtain resources from the server in future requests. The browser controls the cache based on the header information in the request and response.
Expires and Cache-Control
Expires and Cache-Control are used by the server to specify the validity period of the client.
For example, in the preceding response header, Expires specifies the Cache expiration time (Date is the current time), while max-age of Cache-Control specifies the Cache validity time (2552 s ), theoretically, the effective time calculated by these two values should be the same (as if they were inconsistent ). Expires is HTTP1.0, while Cache-Control is HTTP1.1. It specifies that if both max-age and Expires exist, the former has a higher priority than the latter. The Cache-Control parameter can be set with many values, for example (refer to the browser Cache mechanism ):
Last-Modified/If-Modified-Since
Last-Modified/If-Modified-Since is the first method used to check whether the server file is updated after the expiration date. It must be used with Cache-Control. For example, when you access my homepage simplify the life for the first time, a jquery file is requested and the Response Header returns the following information:
Then I press ctrl + r on the homepage to refresh, ctrl + r skips the max-age and Expires checks by default and directly sends a request to the server (The following describes how to read the cache after various refreshes). Let's look at the request:
The request header contains the If-Modified-Since item, and its value is the same as the Last-Modified in the response header of the Last request. We found that this date is in the distant month of 2013, that is to say, this jquery file has not been modified since the date of January 1, 2013. Compare the date of If-Modified-Since with the last modification date of the file on the server. If the modification date is the same, the system returns HTTP304 and reads data from the cache, return the data and update the value of last-Modified through the Response Header (for future comparison ).
ETag/If-None-Match
The ETag/If-None-Match method is the second method described in the preceding statement to check whether the server file is updated. It must also be used with Cache-Control. In fact, ETag is not the version number of the file, but a string that represents the unique string of the file (the ETag value in Apache, which is the INode of the file by default ), size (Size) and the last modification time (MTime) are obtained after Hash .), When the client finds that it has passed the time agreed by the server to directly read the cache, it will send the If-None-Match option in the request. The value is the ETag value in the response header after the previous request, this value represents the unique string comparison between the server and the server (if the server changes the file, the value will change). If the value is the same, the corresponding HTTP304 is used, and the client directly reads the cache, if they are different, HTTP200, download the correct data and update the ETag value.
As shown above, when the time for Directly Reading the local cache agreed with the server is too long, a new request will be sent to the server with the If-None-Match item in the request header, the string value will be matched on the server. Obviously, there is no change (depending on the ETag value in the Response Header), so the response to HTTP304 is directly read from the cache. You may send this request with the If-Modified-Since item. If both are present, If-None-Match takes precedence over If-Modified-Since. Maybe you will ask why it takes priority? The two functions are similar or even the same. Why do they exist at the same time? The emergence of ETag in HTTP1.1 mainly aims to solve several problems that are hard to solve by Last-Modified:
- The Last modification of the Last-Modified annotation can only be accurate to seconds. If some files are Modified multiple times within one second, the Last modification time of the Last-Modified annotation cannot be accurately specified.
- If some files are generated on a regular basis, but sometimes the content does not change (only changes the time), but the Last-Modified changes, the file cannot use the cache.
- There may be situations where the server does not accurately obtain the file modification time or is inconsistent with the proxy server time.
Requests that cannot be cached
Of course, not all requests can be cached.
Requests that cannot be cached by the browser:
- The HTTP header contains Cache-Control: no-cache, pragma: no-cache (HTTP1.0), or Cache-Control: max-age = 0 and other requests that tell the browser not to Cache.
- Dynamic requests that determine the input content based on cookies and authentication information cannot be cached.
- After HTTPS secure encryption requests (some tests have found that ie actually adds Cache-Control: max-age information to the header, and firefox adds Cache-Control: Public to the header, can cache HTTPS resources)
- POST requests cannot be cached
- Requests that do not contain the Last-Modified/Etag or Cache-Control/Expires in the HTTP Response Header cannot be cached.
User behavior and Cache
The browser cache process is also related to user behavior. As mentioned above, open my homepage simplify the life and have a jquery request. If you press enter directly in the address bar, response HTTP200 (from cache), because the cache has Not been directly read since the validity period; if ctrl + r is refreshed, the corresponding HTTP304 (Not Modified) will be returned, although it is still read from the local cache, but there is one more request from the server. If it is ctrl + shift + r, it will directly download the new file from the server and respond to HTTP200.
As shown in the preceding table, when you press F5 to refresh, The Expires/Cache-Control settings are ignored and the request is sent to the server again, the Last-Modified/Etag is valid, and the server determines whether the system returns 304 or 200 based on the actual situation. When the user uses Ctrl + F5 to force refresh, however, all cache mechanisms will expire and resources will be pulled from the server again.
For more information, see the browser cache mechanism.
Summary
The browser cache mechanism for image theft. The two images are clear.
For more details, please continue to read the highlights on the next page: