The principle of CDN and some of its technologies

Source: Internet
Author: User
Tags hash http request server resume browser cache nginx server
Demand

CDN, full content Delivery Network, the main role is to reduce access pressure for the source station while providing clients with faster content response. In addition, the CDN can also protect the source station security. In fact, the real CDN is paid for the source station, so the users of the CDN is actually the source station, such as Sina Weibo, Youku video, Taobao ah and so on. And the client, is the user of the CDN user. So the CDN is sandwiched between the source station and the source station users, the following is referred to as the client is the source station users. Working principle Simply put, the principle of CDN is to put your source station resources cached in the CDN node all over the country, when the user requests resources, the nearest return to the resources cached on the node, and do not need each user's request back to your source station, avoid network congestion, share the source station pressure, Ensure that users are accessing resources at a speed and experience.
Architecture Diagram

The HTTP request processing process after using CDN is shown below:

Traditional website Access process

To say how the CDN works, we must first talk about the access process of Internet resources. Traditionally, visiting a website in a browser should have some steps: Type the URL in the browser www.taobao.com the browser to request a DNS server, query to the www.taobao.com corresponding IP browser to initiate a TCP connection to the server The browser sends the page content browser to the browser by sending the HTTP protocol message server through the established TCP connection to display the page

For the 2nd step mentioned above, there is still a need to explain the DNS parsing process in more detail, because it is the basic condition for CDN to work. DNS Work Process

The working process of DNS is easy to ignore, generally only know that the input of the DNS is a URL, the output is an IP, here I also just give myself a summary of the record. The DNS protocol is primarily UDP-based, so the QPS for DNS server is generally amazing, and the QPS is several levels higher than Web server (HTTP is TCP-based). There is a basic concept of DNS record type, common DNS record types are a,aaaa,cname and so on. The A record is the domain name to the IPV4 address, the AAAA record is the domain name to the IPV6 address, the CNAME record is similar to the forwarding in the query process, meaning you ask this person, he cares about this matter. OK, let's go ahead and talk about the working process of DNS. In the browser type www.taobao.com, in fact, the real DNS protocol is used in the www.taobao.com. Finally, there is a point, perhaps because of aesthetics and other reasons, generally do not display the query local cache (host file or browser cache) there is no record of the domain name corresponding , some words directly to the operator's DNS server to initiate a DNS resolution request, generally referred to as the operator's DNS server for local DNS locally DNS will query the native cache, and local DNS settings cache time is fastidious, too long too short is not good. In addition, the local DNS query is the operator, the water is very deep, external uncontrolled local DNS if there is no cache, the domain name from the right to the left to scan, and then request the corresponding server, For example, for the domain name www.taobao.com, first ask the responsible. The root name server, is the legend of the world only a few of those servers, they will reply. who manages the COM and then the local DNS went to the management. COM server (assuming the name is S1), to ask Taobao.com who is the tube, in general, in the S1 to find the record is a CNAME record (Ali after all, the big company, manage their own domain name), and then go to Ali own DNS server came up, generally known as the authoritative server Authoritative server is built by Ali himself, and then according to some of the company's internal configuration Ah, adjust Ah, find www.taobao.com. The corresponding server is who, return an IP address local DNS cache this IP address, and reply to the browser Server resume for browser and corresponding IP address TCP connection, sending HTTP message

The friends who have bought the domain name know that If you buy cstdlib.com in Wan, and then you want to enable a level two domain name go.cstdlib.com, then you want to go to the WAN Network console (already and Alibaba Cloud merge) set a record of resolution, will go.cstdlib.com point to the IP you want. This is the process of adding a two-level domain name each time. So, if you know the DNS parsing process, you can do this: on the server D1 a DNS server, as the cstdlib.com DNS authoritative server in the WAN Network's console add a CNAME record, cstdlib.com resolution to D1 to D1 to return what IP to return what IP

In this way, everything in control, after all, D1 is your, and later you do not have to go to the network console, this is the self-built DNS server. CDN Selects high-quality nodes

Back to the chase, how the Cdn chooses a smaller node for the user. This time not to visit Taobao as an example, because Ali has its own CDN, if to visit Taobao as an example, easy to confuse the provider and the source station CDN. The example of Sina Weibo as the source station, assuming that Weibo uses Ali's CDN (not suppose, the news here), then Ali Cdn will tell Weibo, you want me to speed up a picture for you, then you put this image to my server (can cname, can also directly write Ali CDN URL), Then, Ali CDN DNS authoritative server, will receive such a resolution request, "Please tell me, Sina Weibo 1.png node where." This is where the CDN system is going to go.

Suppose we are now Ali CDN DNS authoritative server, someone asked us "Sina Weibo 1.png node where", then I will do this: first look at this person asked me IP is how much (recall the process of DNS parsing, we see should be the local DNS IP), Then according to this IP to find out where he is, Beijing or Guangzhou, Shanghai or Shenzhen. If it is Beijing, then I will give you the address of the node to return to Beijing, if it is Shanghai, then I will give you the address of the node to return to Shanghai, so that the nearest visit.

In the IP address to the location of the process, need to use the IP library, Ali CDN IP address base cheap, because Ali CDN in charge of the head of the Archsummit Architect Summit, they can use Taobao package records to calibrate, really witty.

Of course, the proximity is only one factor to consider, there are many factors to consider, such as network costs, traffic distribution, source station load and so on. It's a complicated process, and I'm just trying to give an intuitive perspective. CDN reduces source station pressure

Just said the CDN is how to choose high-quality node, then for the client, there is an explanation. So next consider how to give the source station an explanation: reduce the source station pressure. If every user request let him go directly to the source station, the source station will be under great pressure, so consider providing an HTTP cache for the source station, reduce the source station pressure by increasing the cache hit rate.

For example, the first user requested 1.png, then the CDN first put this picture cache (cache simple can be considered a hash table, key is Url,value is response) up, the next time someone wants 1.png, return directly to him, thereby reducing the amount of return.

The HTTP cache server is a very complex feature. Below or paste a tertiary degree at the Archsummit Architect summit to use the PPT bar, to say about the technology, Ali's HTTP cache server called Swfit, exactly the same name as the Apple language.

The diagram is a CDN node, the user's request from the LVS (LVs is a four-tier load balancer component, the author is Dr. Zhangwensong, current CTO) of the entrance to the first by the LVS to do a 4-tier load balancing, and then go to a Tengine (Ali on the basis of the development of the Nginx server) on , Tengine does a consistent hash, selects a swift (the HTTP cache server used by Ali), and Swift goes back to the cache source. The next step is to post a PowerPoint presentation at the Archsummit Architect summit with a look at Swift's architecture.

As you can see first, Swift is a multithreaded program, each threading a epoll to give full play to the multicore processing power. and to minimize the context switch between threads, a request is handled as much as possible on one thread. Then you can see the memory cache, SSD cache, SATA cache. According to the tertiary degree, Swift will have hot-phase elimination mechanism, the hot files in memory, the secondary heat files on the SSD, the last is the SATA disk, and then there will be hot elimination and promotion mechanism.

At the same time, the Archsummit summit also proposed that Tengine and swift communicate through the SPDY protocol to optimize the efficiency of HTTP. Therefore, CDN technology is still very deep, network, IO, multi-threading, tcp/ip,http These background common nouns in the face of the most vividly. Edge Corner

In fact, in the DNS query process has one such problem, authoritative server receives the request, can only get the local DNS IP, do not know the client IP. This is a very painful thing, so Google put forward a edns protocol, will take the client IP, but actually not practical, because it is equivalent to caching DNS query results when more than one-dimensional client IP, one-dimensional array of two-dimensional array, is simply a disaster of memory. So, we usually do not use 8.8.8.8 such a DNS server, otherwise people think you are in the United States, and then use the United States source station and you communicate, certainly slow into a dog ah. Summary

Summarize how the CDN works: Through the authoritative DNS server to achieve high-quality node selection, through the cache to reduce the source station pressure. Recommended Reading

Finally recommend Ali CDN Head of the Archsummit on the speech, the Ali CDN architecture is very clear. Many of the contents of this article come from this lecture. Links are here.

Original link: http://www.cstdlib.com/tech/2015/08/18/what-is-cdn/

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.