1. What is
CDN
The full name of cdn is content distribution network. Its purpose is to allow users to get the requested data more quickly. Simply put, CDN is used to speed up, it allows users to access data nearby, so that they can get the data they need faster. For example, now the server is in Beijing, and users in Shenzhen need to cross a long distance to get the data on the server. This is obviously slower than users in Beijing accessing the Beijing server. But now we are building a CDN server in Shenzhen with some data cached on it. Shenzhen users will first access the CDN server when visiting. If there is data requested by the user on the server, it can be returned directly, which greatly improves the speed.
2.dns service
To understand cdn, you must first understand dns. When we enter a domain name in the browser, we first need to convert the domain name to an ip address, and then convert the ip address to a mac address, so that the server can be found on the network. Let's not look at the process of ip conversion to mac address, first let's see how to convert a domain name to ip.
When we initiate a request to resolve a domain name to the dns server, the dns server first queries whether the domain name is in its cache, and if the domain name exists in the cache, it can directly return the ip address. If it is not in the cache, the server will access it layer by layer in a recursive manner. For example, if we want to visit www.baidu.com, first we will initiate a request to 13 root servers around the world, ask for the address of the com domain name, and then send a request to the name server responsible for the com domain name to find baidu.com, and so on. Recursively, finally find the ip address we need.
3. The relationship between
dns and cdn
I just mentioned that the CDN is actually a nearby visit, then there is a question now, how do we know where the user is and assign the best CDN node to him. This requires dns service for positioning. When we use the dns service, we can perform a positioning based on the LDNS server he uses. For example, our dispatch server sees that it is an LDNS server from Shenzhen Telecom, then we think that the user is from Shenzhen Telecom, and then the dispatch server The user can be allowed to access the CDN server of Shenzhen Telecom, so that the user can access the optimal CDN node.
Through the dns service, we can quickly locate the user's location, and then assign the user the best CDN node, but this scheduling method has a problem, for example, when I am a user of Beijing Unicom but use Shenzhen Telecom's ldns If so, the dispatch server will assign me to Shenzhen Telecom's CDN server, which will cause wrong dispatch.
4. http scheduling method
In response to the above problems, we have another scheduling method-http scheduling.
When a user visits our server, the server first analyzes the user's ip address, and then the server returns a 302 redirect to the user, storing the server closest to the user in the location, and the user can get the best CDN by requesting this CDN server node.
The advantage of this scheduling method is that its positioning is more accurate and will not cause access deviation due to wrong ldns. But its disadvantage is that it needs an additional http access, so the first access delay will be relatively high. So if it is a request for a large file, obviously it is more appropriate to use the http scheduling method, because the request for a large file requires more time. In contrast, the time of this http request can be ignored, but if it is a request for some small files At times, spending this time is not worth the gain.
Of course, we can also use these two methods together, first locate through dns, and then correct the deviation through http.
5. Two ways of caching
Some resources on the server are cached in the CDN. So how does the server update the cache of the cdn node? There are two ways here, one is that the server actively updates the cache, and the cdn node passively accepts it. Another way is when the resource requested by the user does not exist, the CDN server initiates a request to the upstream server, updates the cache, and then returns the data to the user. This way is that the CDN server is active and the origin server is passive. Obviously, the first method has many problems, for example, it is easy to generate 404, so the second cache method is generally adopted.
6. The whole working process of cdn
When a user requests a file, the working process of cdn is as follows:
1.dns request local local DNS
2. Local DNS recursively query the gslb of the server
3. The server allocates the best node according to local DNS and returns ip
4. The user gets the best access ip and visits the best node.
5. If the node does not have the content that the user wants to obtain, visit the previous node through internal routing until the file is found or the source station is reached.
6. The cdn node caches the data and can return directly when requesting the file next time