CDN Cache Summary

Last Update:2020-09-24 Source: Internet

Author: User

Keywords cdn cdn cache cdn meaning

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Why use CDN?
First of all, CDN can be understood as a common cache, such as proxy cache or edge cache. Even if you don't care about the user's specific geographic location, you should consider using CDN proxy cache to improve user experience.

Simply put, proxy caching will cache some pages of your website, and it is very fast to transfer static content through the cache.

A simple example:

Suppose you have a blog with a start page, which lists all recent blogs. To complete this process, the PHP script needs to retrieve the latest article entities from the database, convert them into HTML result pages and return them to the user.

Therefore, for one request or access, it includes: one PHP execution + a set of database queries, for 1000 requests, it includes: 1000 PHP executions + 1000 sets of database queries, and each PHP execution requires CPU , Memory and
I/O operations are the same for databases.

When the CPU and memory access reaches the limit, there will be problems with the access, the speed will start to become very slow, or even inaccessible, even if the hardware can be used to break through this bottleneck, the engineering will become very complicated , So you will need to add a layer of proxy cache in the middle to reduce resource constraints on you.

Taking the previous example as an example, only the first request needs to execute the PHP script, query the database and generate the HTML result page when using the proxy cache. All subsequent requests will fetch content from this cache, and reading the cache is almost as fast as directly reading the memory. This means that the above linear scale bottleneck problem is solved! It doesn't matter if there are 100 users or 1000 users, there is still only 1 PHP execution, 1 database query, and 1 result page generation.

2. How does CDN work?
In fact, the types of CDNs are also different, starting from two types:'pull cdn' and'push cdn'. As the name suggests,'push cdn' means you need to provide the corresponding content to the cdn, but'pull cdn' means how to download from the cdn Ingest content.

* How does pull cdn work? *

Let's take an example, suppose you have an accessible website, the URL is https://www.fooer.com. In this scenario, the domain name fooer.com will be placed on the pull CDN server instead of your web server. CDN acts as a proxy for your web server.

There is also a non-public domain name pointing to the actual web server. In this example, assume it is direct.fooer.com, and the actual web server is called the source.

This CDN will accept all requests. If there is a result in its cache, it will be directly returned to the user, otherwise the request will be hosted on your actual web server, and then the returned result will be cached for future requests, and the result will be returned to the user.

The simplest operation process of pull cdn is as follows:

A request to get a page, for example, this page is fooer.com/some/page
Use some/page as the cache key to check if it exists in the cache
In the cache, the results are returned directly from the cache to the user
If not in the cache, request https://www.fooer.com/some/page, write the returned result into the cache with /some/page as the key, and return it to the user
3. Cache header
Most pull CDNs solve the problem of dynamic content in the form of "per page" caching. In order to achieve this effect, a simple method is the HTPP response cache header.

Regarding http cache headers, the most common tags in the new version are ETag and Cache-Control, but the old version also supports multiple notes such as Expires, Pragma, and Age, but these are used as backwards compatibility.

ETag:

ETag: It is the identifier of the document version. It is usually the MD5 value of the content, but it can also contain other content, which represents the version/date of the document, such as 1.0 or 2017-10-30. One thing to note here is that it must be enclosed in double quotes, such as: ETag: "d3b0756geyg42sd3edec49eaa6238ad5ff00".

Second verification:

The practical application of ETag: secondary verification. We do not consider the previous proxy + source architecture model for the time being, and only consider the simple client-server model.

Suppose the client requests http://www.fooer.com/hello.txt, and the server returns the following content:

In the response, there are two interesting headers: one is ETag, the MD5 value of the content, and the other is Last-Modified, which is the time when the hello.txt file was last modified.

Here is where the secondary authentication works: when the client accesses the above URL again within a short period of time, the client browser will use the If-* request header. For example, If-None-Match checks whether the content of ETag has changed. In other words, if the ETag changes, the client receives a complete new response; if the ETag does not change, the client receives an identifier indicating that the content has not changed.

If the ETag has not changed, the server will return:

As shown above, the server's response this time is not 200 ok, but 304 Not Modified, which means that it skips the body part and allows the client to go directly to its own cache to get the data. In this example, the body content is the body, which is relatively small and the effect is not obvious. But imagine that if it is a large content, or a very complex dynamically generated content, the value is great.

We all know that what you want to cache most is the content, and the cost of generating the content is the biggest, so the ETag header is a better choice.

Cache-Control header

Cache-Control can be used for both request headers and response headers. Secondly, it controls two caches: local cache (private cache) and shared cache.

Local cache: refers to the cache in the local machine of the client. It is not completely under your control, usually the browser will decide whether to put some content in the cache, which means: don't rely on the local cache. The user may also clear all caches when the browser is closed, and you are not aware of such an operation.
Shared cache (CDN): the cache between the client and the server, that is, CDN.
Cache-Control has several properties:

private: means it should only exist in the local cache
public: Indicates that it can exist in either a shared cache or a local cache
no-cache: Indicates whether it is a local cache or a shared cache, the value in the cache must be used to re-verify before using it
no-store: indicates that it is not allowed to be cached.
max-age=: Set the cache time in seconds. Both local cache and shared cache are fine
s-maxage=: Override the max-age attribute. Only works in the shared cache.
immutable: Indicates that the document cannot be changed.
must-revalidate: indicates that the client (browser) must check whether the proxy server exists, even if it has been cached locally.
proxy-revalidata: Indicates that the shared cache (CDN) must check whether the source exists, even if there is a cache.
Examples are as follows:

Cache-Control: public max-age=3600 //both local cache and CDN cache are cached for 1 hour;
Cache-Control: private immutable
//Cannot be cached in CDN, only cached locally. And once it is cached, it cannot be updated;
Cache-Control: no-cache //Cannot cache. If it must be cached, make sure to verify it twice;
Cache-Control: public max-age=3600 s-maxage=7200
//Local cache for 1 hour, CDN cache for 2 hours;
Cache-Control: public max-age=3600 proxy-revalidate
//Both local and CDN cache for 1 hour. But if the CDN receives the request, even though it has been cached for 1 hour, it still needs to check whether the document in the source has been changed.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More