CDN is a frequently used technology on the Internet. You may often hear people say: "Our website uses the CDN technology." However, they may not know much about CDN, but may be limited to: when it is used, website access will become faster.
In fact, the principle of CDN is very simple. When a browser requests a resource, the first step is DNS resolution. DNS resolution is like finding a number from the address book based on the name. The browser sends the domain name and obtains the IP address returned by the DNS server. The browser connects to the server through the IP address and obtains resources (the DNS server has a lot of caches, but beyond the scope of this article ).
For a small site or a personal blog, a domain name corresponds to an IP address, and a large site may contain multiple IP addresses.
When you request a resource (such as a website), the distance will affect the connection speed. Therefore, it is slow to access a website outside China. Therefore, some large companies configure servers and synchronize data around the world. This is called CDN, and those servers closest to local users are called edge servers )".
DNS resolution
When a browser initiates a domain name resolution request through CDN, the website with a single IP address may be different. The DNS server will find the most suitable server to process the request, and it is very simple. The DNS will find the Edge Server closest to the Request Location. As shown in, if I send a request from Virginia to a server in central United States, I will get the address of the Edge server on the East Coast. If I send a request from California, the address of the Edge server on the West Coast is obtained.
That is to say, the first step in request processing: Find the server closest to the Request Location. Some companies may use other methods to optimize the CDN server. For example, if the recent server is fully loaded, the subsequent requests will be forwarded to other idle servers. In short, CDN always finds the most suitable server to process requests.
Get content
The Edge Server is a proxy cache, similar to browser cache. When a request arrives at the Edge server, it first checks whether the content is up-to-date. The cache identifier (key) is the whole URL address (the same as the browser). If the content has been cached and has not expired, the cached content is directly returned.
If it is not cached or has expired, the Edge server sends a request to the source server to obtain the content and caches it.
Yahoo has created an open-source project called Apache Traffic Server to manage direct interaction between CDN. If you want to learn more about the principle of proxy cache, read the documentation of this project.
Example
In Yahoo's CDN service, a tool called combo handler integrates requests from multiple files into a request-response operation. The following is an example:
http://yui.yahooapis.com/combo?3.4.1/build/yui-base/yui-base-min.js&3.4.1/build/array-extras/array-extras-min.js
The domain name yui.yahooapis.com is part of the Yahoo CDN service and will forward your request to the nearest Edge server, which contains two file yui-base-min.js and array-extras-min.js, but can be done with only one response. These logical processing operations are not on the Edge server and can only be performed on the source server.
What does static mean? In what situations is CDN suitable?
Whenever I describe a system similar to the aforementioned "combined processor", I often see other people's confused expressions. CDN is sometimes easy to confuse with FTP resources because they all upload static resources for others' access. I hope that the above description will help you understand the two. An edge Server is a proxy. The source server tells the Edge server what content is returned. The source server may be Java, Ruby, node. JS,. net, etc. Therefore, any logic can be implemented. The Edge Server does nothing but requests and returns content.
Since CDN is so efficient, why not use CDN to improve the performance of the website? CDN is essentially a cache. If dynamic pages are stored and the content of each page changes, the Edge server and the source server need to interact with each request. Therefore, this cache is meaningless.
This is why JavaScript, CSS, images, Flash, audio, video, and other files are especially suitable for using the CDN technology, because these files remain unchanged and all users obtain the same information, all users will benefit from CDN cache.
Cache expiration
Yahoo performance guidelines stipulate that static resources should have cache expiration identifiers stored in HTTP header. There are two reasons for this: first, the browser caches resources, and second, CDN also caches resources for a period of time. This also means that you cannot use duplicate file names, because they will be cached in at least two places and users may not be able to get the latest file.
There are several ways to solve this problem. The Yui library is differentiated by directories containing libraries of different versions. You can also add an identifier at the end of the file name, such as the MD5 hash value or the revision number of the version control software. To ensure that your request contains an expiration ID, you can still obtain the latest resource file.
Conclusion
CDN technology is already an important part of today's Internet. Over time, it will only become more and more important. Even now, some companies are still trying to move more features to edge servers, so as to give users a faster experience. Edge side encryption DES (Esi) is a technology used to cache part of the page content.
A better understanding of CDN technology and working principles is the key to improving CDN performance.
Original article: http://www.nczonline.net/blog/2011/11/29/how-content-delivery-networks-cdns-work/
Note: This is the first time I have translated an English article. It takes more time than I thought. It would be my greatest pleasure to read the article and I would like to criticize and point out the shortcomings.