"Take doctrine" when we talk about Web caching, what are we talking about?

Source: Internet
Author: User
Tags http 200 response code browser cache chrome developer sessionstorage

What is the first part of Web caching

Scenario 1: When testing a sister's function, it says why my browser's display is a mess, how does my interface differ from other browsers? Next to the person will be reminded that: clear the cache to try.

Scenario 2: The development of the Code, the environment, the discovery does not take effect, the first is to clear the cache, clear the browser cache discovery or not, and then check, found to be reverse proxy cache.

So, what are we talking about when we talk about Web caching? Where can I cache? When to use what cache? How can we avoid the problems that are caused by improper use?

Will not be silly to know, then we have to take a look at the Web cache what is it?

Caching: Caching is about putting data or what we need to get to where it can be accessed faster. The cache should be familiar to the coder of the front-end backend, regardless of the front end, we use the cache to improve performance.

Web caching: According to the above logic, in order to improve the performance of Web page access, cache can be cached pages or data to be able to get the place faster. Generalized web caches can also include server caches, which are not included in the server cache in order to differentiate them from server caches.

Part two types of web caches

In a typical Web application, a browser-initiated request passes through several steps (in which the CDN, reverse proxy is optional), then the cache place or hierarchy is also well understood, that is, the browser, reverse proxy, CDN.

Part Three browser cache

Browser caching is used by all web apps, and the browser has a lot of cache types that we can view using the developer tools provided by the browser.

Take Chrome as an example, open the Chrome developer tool and select "Resources" to see all the cache types as shown in:

First, Frames

The frames cache is a browser file-level cache based on the HTTP protocol.

When the browser sends a file request, it can determine from the server-side request file or from the local cache to read the file, the main judgment is based on the expires and ETag, the process of reading the file, such as:

As you can see from this flowchart, there are several properties that affect the browser's file cache: Expires, Etag, last-modified, and these three properties are defined by the HTTP protocol.

(i) control the properties of the cache

In http1.0, it is agreed to use expires to determine whether the files in the cache are used. In http1.1, the Cache-control, Last-modified/if-modified-since, and ETag are used in the agreement. Here's a look at the definitions of each property:

1, Expires

Used to set the expiration time for a static resource.

2, Cache-control

The Cache-control can be used to control whether the cache, cached read permissions, and resource lifetime. But Cache-control more choice, more detailed settings, if set at the same time, its priority is higher than expires.

(1) Public indicates that response data can be cached by any client

(2) Private indicates that the response data can be cached by a non-shared cache. This indicates that the data of the response can be cached by the browser that sent the request, not by the mediation

(3) No-cache indicates that the response data cannot be cached by any client that accepts a response

(4) No-store indicates that the transmitted response data cannot be cached or stored in the disk. Typically used for sensitive data to prevent data from being copied.

(5) must-revalidate indicates that all caches must be re-verified, and in this process, the browser sends a if-modified-since header. If the server program verifies that the current response data is the latest data, the server should return a 304 not modified response to the client, otherwise the response data will be sent to the client again.

(6) Proxy-revalidate is similar to must-revalidate and is used to indicate a shared cache.

(7) Max-age: (unit seconds) data after max-age set the number of seconds will be invalidated, equivalent to the expires head in http/1.0. If you set both Max-age and expires in a single response, the max-age will have a higher priority. (Note: Ngnix settings expires will be converted to max-age)

3, Last-modified/if-modified-since

L Last-modified: Indicates the last modification time of this response resource. When the Web server responds to a request, it tells the browser the last modification time of the resource.

L If-modified-since: When a resource expires (using Cache-control-identified max-age) and the discovery resource has a last-modified claim, the request to the Web server takes up If-modified-since, Represents the request time. When the Web server receives the request, it finds that the header if-modified-since is compared to the last modification time of the requested resource. If the last modification time is newer, indicating that the resource has been changed, then respond to the entire resource content (written in the response message packet), HTTP 200, if the last modification time is older, the resource has no new modifications, the response to HTTP 304 (no package, save browsing), tell the browser to continue to use the saved cache.

4, Etag/if-none-match

Etag/if-none-match should also be used in conjunction with Cache-control.

When the Etag:web server responds to a request, it tells the browser that the current resource is uniquely identified by the server (the build rule is defined by the server). Nginx, the etag will increase by default, if need to close, you need to set in the configuration file: ETag off;

L If-none-match: When a resource expires (using Cache-control-identified max-age) and the discovery resource has a etage claim, it is requested again with a If-none-match (ETag value) to the Web server. When the Web server receives the request, it finds that the header if-none-match is compared to the corresponding check string for the requested resource and decides to return 200 or 304.

(ii) User behavior and caching

Browser caching behavior is also related to user behavior

/tr>
user action expires/ Cache-control last-modified/etag
Address bar return valid valid
page link jump valid valid
new window valid valid
forward, back valid valid
f5 refresh invalid valid
ctrl+f5 refresh invalid invalid

(ii) How to control caching

There are two ways to set up the cache:

1. Web server Configuration

Take Ngnix as an example, set in nginx.conf:

location~. *\. (gif|jpg|png|htm|html|css|js|flv|ico|swf) (.*) {

Expires 1d;

}

The above configuration indicates that these static files expire after 1 days. If you want to configure to not cache at all, you can set it to Expires-1, (the following number is configured as negative), and the header returned will be set to Cache-control:no-cache

2. Background code Write

For example:

Response.setheader ("Cache-control", "No-cache");

3. Meta tags for html

<meta http-equiv= "Cache-control" content= "max-age=7200"/>

(iii) Issues and solutions for caching

1, after the introduction of the cache, there are two main problems:

(1) The browser does not know if there is a resource update or uses the old file in the cache.

(2) Each file cache policy is inconsistent, related files, some from the server load, and some directly to the browser cache, which may lead to interface confusion.

2, the way to solve

(1) ETag or last-modified

The ETag is the string generated by the server based on the file information, and the ETag changes when the server file is updated, which ensures that the new file content is taken when the server file is updated.

But the problem with the ETag solution is that the request will still be sent to the server and judged by the server.

The last-modified is similar to the etag.

(2) FileName suffix

During the build process, the build file is added with a random suffix, and the reference file in the main portal HTML is replaced with the extension of the file name suffix in the build, and the main portal file is configured to not cache.

When the server updates the file, because the file name suffix changes, the browser cache does not match, will be directly to the server to obtain, the server does not update the file when the browser cache gets.

This works better, but requires a build that is more appropriate for Web applications that already use front-end builds.

Second, cookies

A cookie is a technique that allows a Web server to store small amounts of data on a client's hard disk or memory, or to read data from a client's hard disk. When we browse a website, a very small text file is placed on your hard disk by the Web server, which can record information such as user ID, password, page visited, time of stay, etc.

Cookies are stored in a key-value pair, with a number and size limit, and the number of browsers is different and the size cannot exceed 4K.

(a) How cookies are set:

1. Browser

The browser provides a way to manipulate cookies, which can be set, read, and deleted. In addition, cookies can also set expiration times.

How the browser obtains cookies:

Document.cookie

2. Server

Very often, we use cookies to assist with session management, and after successful login, the SessionID information is written to the cookie by the server, and all requests sent by subsequent clients carry cookie information, and the server verifies the SessionID information in the cookie. Determines whether this request is legitimate. In Java, for example, the server writes cookies in the following ways:

Cookie cookie = new Cookie ("SessionID", Urlencoder.encode ("fejerwiie2234", "UTF-8")); Response.addcookie (cookie);

(ii) Attributes of the cookie

Property name attribute meaning
Name Name of the cookie
Value The value of the cookie
Domain Domain name where this cookie can be accessed
Path The path to the page where this cookie can be accessed.
If domain is Abc.com,path is/test, then only the page under the/test path can read this cookie
Expires/max-age The field is time-out for this cookie. If the value is set to a time, then this cookie expires when this time is reached. If not set, the default value is session, meaning that the cookie will expire with the session. This cookie expires when the browser is closed (not the Browser tab, but the entire browser)
Size This cookie size
HttpOnly The HttpOnly attribute of the cookie. If this property is true, only the information in the HTTP request header will be provided with this cookie, and this cookie cannot be accessed through Document.cookie.
Secure Sets whether this cookie can only be passed over HTTPS

Third, Localstorage

Localstorage is the newly added feature in HTML5, the introduction of Localstorage is mainly used as the local storage of the browser, to solve the problem of cookies as insufficient storage capacity. Localstorage is a persistent storage.

Similarly, Localstorage is also a key-value form of storage.

(a) The browser provides the method of adding and deleting localstorage

Add/Modify: Window.localStorage.setItem ("username", "admin");

Enquiry: Window.localStorage.getItem ("username");

Delete: Window.localStorage.removeItem ("username", "admin");

(ii) Precautions for the use of Localstorage

The value stored in Localstorage can only be string, and if you want to store the object, you need to convert it to a string and save it.

Iv. Sessionstorage

Sessionstorage is used to store data locally in a session, which can only be accessed by a page in the same session and destroyed when the session ends. So sessionstorage is not a persistent local store, only session-level storage.

Similarly, the browser also provides the sessionstorage of adding and removing methods, and localstorage consistent, just get the method is: Window.sessionstorage.

Wu, IndexedDB

INDEXEDDB is also provided by HTML5, capable of storing large amounts of structured data in the client's database, and providing APIs for efficient retrieval. The initial size of the INDEXEDDB is 50M and can be increased, in the case of storage, the second way to kill other stores.

But its shortcomings are also obvious, indexeddb not all mainstream browser support, such as IE9, IE10 and IE11 are not supported, so if your user base is also using the IE series browser, INDEXEDDB will not be considered.

INDEXEDDB also has some APIs, which are no longer detailed here, and can be consulted:

Https://developer.mozilla.org/zh-CN/docs/Web/API/IndexedDB_API

Vi. Web SQL

The scheme is obsolete, no longer maintained, and the alternative is INDEXEDDB.

VII. Application Cache

This attribute has been removed from the WEB standard.

Viii. Cache Storage

The program is an experimental solution that is not supported by all browsers.

Cachestorage is defined in the specification of Serviceworker. Cachestorage can save each Serverworker declared cache object, Cachestorage has the open, match, has, delete, keys five core methods, you can respond differently to different matches of the cache object.

Ix. Services Worker

Service workers offer a number of new capabilities that enable Web apps to have the same offline experience and message push experience as the native app. Service worker is also an experimental scenario and not supported by all browsers.

Service worker can:

    1. Background Message Delivery

    2. Network proxy, forwarding request, forgery response

    3. Offline caching

    4. Message push

Refer to: Https://developer.mozilla.org/zh-CN/docs/Web/API/Service_Worker_API/Using_Service_Workers

The third part, CDN cache

Web site loading speed, in addition to the amount and size of resources, a large part of the time is used for network transmission, and the network transmission time and the user's browser and the resources of the server location directly related to the site to improve loading speed, one way is to make the resources of the server and the user's geographical location as close as possible.

CDN: The full name is content Delivery network, i.e. The basic idea is to avoid the bottleneck and link of the Internet which may affect the speed and stability of data transmission, and make the content transmission faster and more stable. CDN includes 4 elements of distributed storage, load balancing, redirection of network requests, and content management. However, content management and global network traffic management are the core of CDN. CDN ensures that content is delivered to users ' requests in an extremely efficient manner, enabling users to get the content they need, to address the congestion of Internet networks, and to increase the responsiveness of users to websites.

The topology diagram for the CDN is as follows:

First, the cache mechanism of CDN

CDN Edge Node Cache policy varies by service provider, but generally follows the HTTP standard protocol, setting the CDN Edge node data cache time through the Cache-control:max-age field in the HTTP response header. When the client requests data from the CDN node, the CDN node will determine whether the cached data expires, and if the cached data is not expired, return the cached data directly to the client; otherwise, the CDN node will send back the source request to the source station, pull up the latest data from the source station, update the local cache, and return the latest data to the client. So, if we modify the content, it is better to add a version number, so that the CDN to regain resources, thereby reducing unnecessary trouble.

CDN Service providers typically provide more granular cache management for users by specifying CDN cache time based on file suffixes and multiple dimensions of the directory. CDN Cache time can have a direct impact on the "back-to-source" rate. If the CDN cache time is short, the data on the CDN Edge node will often fail, resulting in frequent return to the source, increasing the load on the source station, and increasing the access delay, if the CDN cache time is too long, it will bring about slow data update time. Developers need to increase the specific business to do specific data cache time management.

Second, the problem of CDN

The shunt function of CDN not only reduces the user's access delay, but also reduces the load of the source station.

But the main disadvantage is the synchronization of the cache problem: When the site updates, if the data on the CDN node is not updated in time, even if the user browser using CTRL +F5 to invalidate the browser side of the cache, also because the CDN Edge node does not synchronize the latest data to cause user access exceptions.

Third, how to solve the problem of CDN

The main problem with CDNs is that cache synchronization is not brought in time, and there are two ways of caching updates:

(i) Custom caching policies

Static files are returned with properties such as the source server controlling expires, Cache-control, and so on to define the CDN cache policy.

(ii) Active refresh of CDN cache when source server resources are updated

The CDN Edge node is transparent to the developer, and the browser's local cache is invalidated by a forced refresh of the browser Ctrl+f5, and the developer can use the Refresh cache interface provided by the CDN service provider to clean up the CDN Edge node cache. This allows the developer to update the data by using the Refresh Cache feature to force the data cache on the CDN node to expire, ensuring that the client pulls up the latest data when it accesses it.

Part IV, reverse proxy cache

Reverse proxy (Reverse proxy): This mechanism is after the Web server is hidden from the proxy server, the server that implements this mechanism is called the reverse proxy server (Reverse proxy). At this point, the Web server becomes a back-end server and the reverse proxy server is called the front end server.

One of the purposes of introducing a reverse proxy server is caching-based acceleration. We can cache the content on the reverse proxy server, and the implementation of all caching mechanisms still uses the http/1.1 protocol.

(i) reverse proxy cache configuration

In the case of the commonly used reverse proxy--ngnix, the following configuration is implemented for the cache:

1, Proxy_cache_path

Syntax: Proxy_cache_path path [Levels=number] keys_zone=zone_name:zone_size [Inactive=time] [max_size=size];

Default value: None

Using fields: HTTP

Directives specify the path of the cache and some other parameters, the cached data is stored in the file, and the hash value of the proxy URL is used as the keyword and file name. The levels parameter specifies the number of subdirectories that are cached, for example:

Proxy_cache_path/data/nginx/cache Levels=1:2 keys_zone=one:10m;

The file name resembles the following:

/data/nginx/cache/c/29/b7f54b2df7773722d382f4809d65029c

Levels Specify the directory structure, you can use any 1-bit or 2-bit number as the directory structure, such as X, x:x, or x:x:x for example: "2", "2:2", "1:1:2", but can only be a three-level directory.

2, Proxy_cache

Syntax: Proxy_cache zone_name;

Default value: None

Using fields: HTTP, server, location

Set the name of a cache area, and an identical area can be used in different places.

3, Proxy_cache_valid

Syntax: Proxy_cache_valid reply_code [Reply_code ...] time;

Default value: None

Using fields: HTTP, server, location

Set different cache times for different responses, for example:

Proxy_cache_valid 302 10m;

Proxy_cache_valid 404 1m;

Set the cache time to 10 minutes for the response code of 200 and 302, and 404 for the code cache for 1 minutes.

If you define only the time:

Proxy_cache_valid 5m;

Then only the code is cached for the 200, 301, and 302 responses.

You can also use any parameter for any answer.

Proxy_cache_valid 302 10m;

Proxy_cache_valid 301 1h;

Proxy_cache_valid any 1m;

Reference Address Unknown

"Take doctrine" when we talk about Web caching, what are we talking about?

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.