CDN Cache Those things (reprint)
Original address: http://bbs.qcloud.com/forum.php?mod=viewthread&tid=3775
Note: The original text copy, just as their next study backup, do not spray, thank you ~
What is a CDN?
When it comes to the role of CDN, it can be likened to the experience of buying a train ticket for 8 years:
8 years ago, there is no fire ticket consignment point, 12306.cn is not to mention. At that time, train tickets can only be purchased in the ticket hall of the railway station, and I live in the small county is not through the train, train tickets to the city's railway station to buy, and from the county to the city, back and forth is a 4-hour drive, is simply a waste of life. Later on, the small county has appeared in the train ticket sales point, you can buy the train directly at the point of sale, convenient a lot of people in the city no longer need to be in a point of hardship line to buy tickets.
CDN can be understood as distributed in each county of the train ticket consignment point, when users browse the site, the CDN will select a user nearest the CDN Edge node to respond to the user's request, So the request of Hainan Mobile users will not go all the way to Beijing Telecom engine Room server (assuming the source station deployed in Beijing Telecom room).
The advantages of CDN are obvious: (1) The CDN node solves the problem of cross-operator and cross-region access, and the access delay is greatly reduced; (2) Most of the requests are completed at the Cdn Edge node, and the CDN has a shunt function to reduce the load of the source station.
What is a cache?
This is not a deep dive into the architecture behind the CDN, nor does it discuss how the CDN can be a global traffic scheduling strategy, this article focuses on how the data is cached after having a CDN. Caching is an example of a space-time change that exists everywhere. By using extra space, we are able to get faster speeds. first, look at how the user's browser interacts with the server when no Web site is connected to the CDN:When users browse the website, the browser can save a copy of the pictures or other files in the site locally, so that when the user visits the site again, the browser will not have to download all the files, reducing the amount of download means that the speed of page loading is increased. If you add a layer of CDN to the middle, then the user's browser interacts with the server as followsThe client browser first checks whether a local cache is out of date, if it expires, initiates a request to the CDN Edge node, the CDN Edge node detects whether the cache of user request data expires, and if it does not expire, responds directly to the user request, at which point a complete HTTP request ends, or if the data has expired, Then the CDN will also need to send back the source request from the source station (return to the source request) in order to pull up the latest data. The typical topology diagram for a CDN is as follows:Photo Source:http://grefr.iteye.com/blog/2004248
as you can see, in the scenario where the CDN exists, the data goes through both the client (browser) cache and the CDN Edge node cache, which are analyzed in detail in the next two phases of the cache .
Client (browser) caching
Disadvantages of client-side caching
client-side caching reduces server requests, avoids file duplication, and significantly increases the user's place. However, when the site is updated (such as replacing CSS, JS, and image files), the browser still retains the old version of the file, resulting in unpredictable results.
Once Upon a while, a page loaded out, the page elements of the location of random, button click Failure, the front-end GG will be accustomed to ask: "Cache clear?" "And then Ctrl+f5, everything is OK. But sometimes, if we simply hit a carriage return in the browser address bar, or just press F5 refresh, the problem is still unresolved, you can know these three different ways of operation, determine the browser different refresh cache policy?
How does the browser determine whether to use a local file or a new file on the server? Here are a few ways to judge. Browser cache Policy
Expires
Expires:sat, 20:30:54 GMTIf expires is set in the HTTP response message, we avoid the connection to the server until the expires expires. At this point, the browser does not need to make a request to the browser, just need to determine whether the material in hand is out of date, it does not need to increase the burden of the server.Cache-control:max-age Expires's method is good, but we have to calculate a precise time each time. The max-age tag makes it easier to handle expiration times. All we need to say is that you can only use this information for one weeks.
max-age is measured in seconds, such as:cache-control:max-age=645672specified page expires after 645,672 seconds (7.47 days)last-modifiedServer in order to notify the browser of the current version of the file, will send a last modified time label, for example:This allows the browser to know the file creation time he received, and in subsequent requests, the browser will verify the following rules:1. Browser: Hey, I need to jquery.min.js this file, if it is in Tue, after the 08:26:32 GMT modified, please send me. 2. Server: (check file modification time)3. Server: Hey, this file has not been modified since that time, you have the latest version. 4. Browser: Great, I'll show it to the user.
in this case, the server simply returns a 304 response header, reducing the amount of data in response and increasing the speed of the response. For 304 responses, please refer to:http://www.cnblogs.com/ziyunfei/archive/2012/11/17/2772729.htmlafter you press F5 to refresh the page, the page returns a 304 response header. ETag in general, it is possible to compare files by modifying the time. However, in some special cases, such as the server clock error, the server clock is modified, daylight saving time DST arrives after the server times are not updated in time, these will cause the issue of comparing the file version by modification time.
The ETag can be used to solve this problem. An etag is a unique identifier for a file. Like a hash or fingerprint, each file has a separate flag that changes as soon as the file changes.
The server returns the ETag label:
ETag: "39001d-1762a-50bf790757e00"the next order of access is as follows:1. Browser: Hey, I need to jquery.min.js this file, there is no mismatch "39001d-1762a-50bf790757e00" This string of2. Server: (check etag ... )3. Server: Hey, my version here is also "39001d-1762a-50bf790757e00", you are already the latest version of4. Browser: OK, then you can use the local cacheAs with Last-modified, the ETag solves the problem of file version comparisons. Only the ETAG level is higher than the last-modified. Additional tagsCache tags never stop working, but sometimes we need some control over what's already cached. L Cache-control:public indicates that the cached version can be identified by a proxy server or other intermediary server. l cache-control:private means that this file is different for different users. Only the user's own browser is able to cache, and the public proxy server does not allow caching. l Cache-control:no-cache means that the contents of the file should not be cached. This is useful in search or page-flipping results, because the same URL, the corresponding content will be changed. Browser cache refresh
1. Enter the URL in the Address bar and press Enter or click the Go button The browser obtains data from the Web page with minimal requests, and the browser uses the local cache for all content that does not expire, thereby reducing the request to the browser. Therefore, the expires,max-age tag only works in this way.
2. Press F5 or the browser refresh buttonThe Browser appends the necessary cache negotiation to the request, but does not allow the browser to use the local cache directly, which allows the last-modified, etag to work, but not expires.
3. Press CTRL+F5 or press CTRL and click the Refresh button This is a forced flush, always initiating a completely new request, without using any caches.
CDN Cache
After the browser's local cache is invalidated, the browser initiates a request to the CDN Edge node. Similar to browser cache, CDN Edge node also has a set of caching mechanisms.
Disadvantages of CDN Cache
The shunt function of CDN not only reduces the user's access delay, but also reduces the load of the source station. But its shortcomings are also obvious: when the site updates, if the data on the CDN node is not updated in a timely manner, even if the user browser using CTRL +F5 to invalidate the browser side of the cache, also because the CDN Edge node does not synchronize the latest data to cause user access to the exception.
CDN Cache Policy
CDN Edge Node Cache policy varies by service provider, but generally follows the HTTP standard protocol, setting the CDN Edge node data cache time through the Cache-control:max-age field in the HTTP response header.
When the client requests data from the CDN node, the CDN node will determine whether the cached data expires, and if the cached data is not expired, return the cached data directly to the client; otherwise, the CDN node will send back the source request to the source station, pull up the latest data from the source station, update the local cache, and return the latest data to the client.
CDN Service providers typically provide more granular cache management for users by specifying CDN cache time based on file suffixes and multiple dimensions of the directory.
CDN Cache time can have a direct impact on the "back-to-source" rate. If the CDN cache time is short, the data on the CDN Edge node will often fail, resulting in frequent return to the source, increasing the load on the source station, and increasing the access delay, if the CDN cache time is too long, it will bring about slow data update time. Developers need to increase the specific business to do specific data cache time management.
CDN Cache Refresh
The CDN Edge node is transparent to the developer, and the browser's local cache is invalidated by a forced refresh of the browser Ctrl+f5, and the developer can use the Refresh cache interface provided by the CDN service provider to clean up the CDN Edge node cache. This allows the developer to update the data by using the Refresh Cache feature to force the data cache on the CDN node to expire, ensuring that the client pulls up the latest data when it accesses it.
CDN Cache (reprint)