CDN (Content distribution network) technology principles

Source: Internet
Author: User
Tags transparent image

Translated from: http://kb.cnblogs.com/page/121664/

1. Preface

The high-speed development of internet has brought great convenience to people's work and life, and the service quality and access speed of Internet are more and more high, although the bandwidth is increasing and the number of users is increasing, which is affected by the load and transmission distance of Web server. Slow response or frequent complaints and distress. The solution is to use caching technology on the network transmission to make the Web service data flow near the network, it is a very effective technology to optimize the data transmission, so as to obtain high-speed experience and quality assurance.

The purpose of the network caching technology is to reduce redundant data transmission in the network, minimize it, and turn the wide-area transmission into local or nearby access. The content transmitted on the Internet, most of which is duplicated web/ftp data, cache server and network device applying caching technology, can greatly optimize the data link performance and eliminate the node device blocking caused by peak data access. The cache server has a caching function, so most Web page objects (Web Page object), such as HTML, htm, PHP and other page files, gif,tif, PNG, BMP and other image files, and other format files, within the validity period (TTL), for repeated access, You do not have to re-transfer the file entity from the original site, simply pass the simple authentication (freshness Validation)-transfer a dozens of-byte header to the local copy directly to the visitor. Because the cache server is usually deployed near the client side, it can obtain the response speed of the approximate LAN and reduce the bandwidth consumption of the wide area effectively. According to statistics, over 80% of users on the Internet repeatedly access 20% of the information resources, to the application of caching technology to provide pre-requisite conditions. The architecture of the cache server is different from the Web server, the cache server can achieve higher performance than the Web server, the cache server can not only improve the response speed, save bandwidth, it is very effective to speed up the Web server and reduce the load of the source server effectively.

Caching servers (cache server) are highly integrated hardware and software professional server, mainly do cache acceleration services, generally deployed at the edge of the network. According to the acceleration object, it is divided into the client acceleration and the server acceleration, the client accelerates the cache deployment at the network exit, caches the frequently accessed content locally, improves the response speed and saves the bandwidth; Server acceleration, cache deployed in front of the server, as the Web server's front-end machine, Improve Web server performance and speed up access. If more than one cache accelerates the server and distributes in different regions, it is necessary to manage the cache network through an effective mechanism, direct the user to the nearest access, and global load balance traffic, which is the basic idea of CDN content transmission network.

  2. What is a CDN?

The full name of the CDN is the Content Delivery network, which is the contents distribution networks. The goal is to add a new layer of network architecture to the existing Internet, publish the content of the site to the "Edge" of the network closest to the user, so that users can get the content they need, solve the Internet congestion and improve the responsiveness of users to the website. From the technical comprehensive solution due to the network bandwidth is small, user access is large, dot distribution is not equal reason, to solve the user to visit the site of slow response speed of the root cause.

In narrow sense, the content sub-distribution network (CDN) is a new type of network construction, it is a network covering layer which can be specially optimized for releasing rich media in traditional IP network, and the CDN represents a network service model based on quality and order in a broad sense. Simply put, the Content Publishing network (CDN) is a strategic deployment of the overall system, including distributed storage, load balancing, network request redirection and Content Management 4 elements, while content management and global network traffic Management (traffic Management) is the core of the CDN. By judging the user's proximity and server load, the CDN ensures that the content serves the user's requests in an extremely efficient manner. In general, the content service is based on a cache server, also known as the proxy cache (surrogate), which is located at the edge of the network and is only "one hop" away from the user. At the same time, the proxy cache is a transparent image of the content provider's source server, which is typically located in the CDN service provider's Datacenter. Such architectures enable CDN service providers to provide the best possible experience to end users on behalf of their customers, content providers, who cannot tolerate any delay in request response time. According to statistics, the use of CDN technology, can handle the entire Site page 70%~95% content access, reduce the pressure on the server, improve the performance and scalability of the site.

Compared with the current content publishing model, CDN emphasizes the importance of the network in content publishing. By introducing the active Content management layer and global load balancing, the CDN is fundamentally different from the traditional content publishing model. In the traditional content publishing mode, the content is published by the application server of the ICP, and the network is only represented as a transparent data transmission channel, which is manifested in the quality assurance of the network only stay at the level of the packet, but not according to the different content objects differentiated service quality. In addition, because of the "best effort" nature of the IP network, its quality assurance relies on the end-to-end provision of sufficient bandwidth flux between the user and the application server, which is much larger than the actual requirement. In such a content publishing mode, not only a large number of valuable backbone bandwidth is occupied, but also the application server of the ICP load is very heavy, and not predictable. When there are some hot events and surge flow, there is a local hotspot effect, which causes the application server to overload and exit the service. Another drawback of this hub-centric application server's content publishing model is the lack of personalized services and the distortions in the value chain of broadband services, which content providers assume they should not do or do poorly in content publishing services.

Throughout the value chain of broadband services, content providers and users are located at both ends of the entire value chain, relying on network service providers to connect them together. With the maturity of the Internet industry and the transformation of business model, the role of this value chain is more and more subdivided. such as content/application operators, managed service providers, backbone network service providers, access service providers, and so on. Each role in this value chain has to be a division of labor to provide customers with good service, resulting in a multi-win situation. From the combination of content and network mode, the release of content has gone through the content of ICP (application) server and IDC two phases. The boom in IDC also spawned the role of managed service providers. However, IDC does not address the issue of effective publishing of content. Content located in the center of the network does not solve the backbone bandwidth consumption and establish the traffic order on the IP network. As a result, the content is pushed to the edge of the network, providing a near edge service to the user, thus guaranteeing the quality of the service and the order of access throughout the network becomes an obvious choice. And this is the Content publishing network (CDN) service model. The establishment of CDN solves the dilemma of "centralization and dispersion" of content operators. Undoubtedly, it is valuable and indispensable for building a good internet value chain.

3. CDN New applications and customers

The current CDN services are mainly used in securities, financial insurance, ISPs, ICP, online transactions, portals, large and medium-sized companies, network teaching and other fields. In addition, in the industry special network, the Internet can be used, and even network optimization of LAN. Using CDN, these sites do not need to invest in expensive various kinds of servers, set up sub-sites, especially the wide application of streaming media information, long-distance teaching courseware, such as the use of more bandwidth-intensive media information, the application of CDN Network, the content to the edge of the network, so that the content demand point and delivery point to minimize the distance between Therefore, it is of great significance to improve the performance of Web site. CDN Network Construction mainly has the Enterprise Construction CDN Network, serves for the enterprise, the IDC CDN Network, mainly serves the IDC and the value-added service, the network operation main constructs the CDN network, mainly provides the content pushes the service, the CDN Network service provider, specially constructs the CDN to do the service, The user cooperates with the CDN organization, the CDN is responsible for the information transmission work, guarantees the information normal transmission, maintains the transmission network, but the website only needs the content maintenance, no longer need to consider the traffic question.

CDN can guarantee the fast, safe, stable and extensible network.

IDC set up a CDN network, IDC operators generally need to have a number of IDC centers around the division, the service object is hosted in the IDC center of customers, the use of existing network resources, less investment, easy to build. For example, an IDC has 10 rooms in the country, joined the IDC CDN Network, hosted on a node of the Web server, equivalent to 10 mirror servers, the nearest customer access. Broadband metropolitan Area Network, the speed of the Internet, the city bandwidth will generally be the bottleneck, in order to reflect the high-speed experience of the metropolitan Area Network, the solution is to cache the Internet content to the local, the cache will be deployed in the metropolitan area of the pop points, so as to form an efficient and orderly network, Users can access most of the content in just one jump, which is an app that accelerates all CDN applications.

  4. How the CDN works

In describing the implementation principle of CDN, let us first look at the traditional non-cached service access process, in order to understand the way CDN cache access and non-cached access to the difference:

By visible, the process by which a user accesses a site that is not using a CDN cache is:

1), the user to the browser to provide the domain name to access;

2), the browser calls the domain name analytic function library to parse the domain name, in order to obtain this domain name corresponding IP address;

3), the browser uses the resulting IP address, the domain name of the service host to send data access requests;

4) The browser displays the content of the Web page according to the data returned by the domain host.

With the above four steps, the browser completes the process of receiving the domain name from the user to get the data from the Domain Name Service host. The CDN network is to add the cache layer between the user and the server, how to direct the user's request to the cache to get the data of the source server, mainly by taking over the DNS implementation, let's look at the process of accessing the website after using CDN cache:

Through this, we can see that the access process of the site after the use of the CDN cache becomes:

1), the user to the browser to provide the domain name to access;

2), the browser calls the domain name resolution library to resolve the domain name, because the CDN to the domain name resolution process has been adjusted, so the analytic function library generally obtains the domain name corresponding CNAME record, in order to obtain the actual IP address, the browser needs to parse the obtained CNAME domain name again to obtain the actual IP address In this process, the use of global load balancing DNS resolution, such as based on geo-location information to resolve the corresponding IP address, so that users can access the nearest.

3), this resolution obtains the IP address of the CDN cache server, the browser sends the access request to the cache server after obtaining the actual IP address;

4), the cache server according to the browser provided by the domain name to access, through the cache internal DNS resolution to obtain the actual IP address of this domain name, and then by the cache server to the actual IP address to submit access requests;

5), the cache server from the actual IP address to obtain the content, on the one hand in the local storage, for later use, two aspects of the obtained data back to the client, complete the data service process;

6), the client obtains the data that is returned by the cache server and completes the entire browsing data request process.

Through the above analysis, we can obtain, in order to achieve both transparent to the ordinary user (that is, the user client does not need to make any settings after adding the cache, directly use the original domain name of the accelerator site can be accessed), but also to provide accelerated services for the designated site while reducing the impact on the ICP, As long as you modify the domain name resolution part of the entire access process to achieve transparent acceleration services, the following are the specific operating procedures of the CDN network implementation.

1), as an ICP, only need to give the domain name interpretation rights to the CDN operators, other aspects do not need to make any changes; when the operation, ICP modify their own domain name of the resolution record, generally use the CNAME method to point to the address of the CDN network cache server.

2), as a CDN operator, the first need to provide an open interpretation of the domain name of the ICP, in order to achieve sortlist, is generally the ICP's domain name interpreting results point to a CNAME record;

3), when the need for Sorlist, the CDN operator can use DNS to the CNAME point to the domain name resolution process to special processing, so that the DNS server can receive a client request, according to the client's IP address, the same domain name to return different IP addresses;

4), due to the IP address obtained from the CNAME, and with hostname information, the request to reach the cache, the cache must know the IP address of the source server, so in the CDN operator internal maintenance of an internal DNS server, to explain the user's access to the real IP address of the domain name;

5), when maintaining the internal DNS server, also need to maintain a licensing server, control which domain names can be cached, and which do not cache, in order to avoid the occurrence of open proxy.

  5. Technical means of CDN

The main technical means to realize CDN is cache, mirror server. Can work in DNS resolution or HTTP redirection two ways, through the cache server, or offsite mirror site to complete the transfer of content and synchronization updates. DNS mode The accuracy rate of the user location is greater than the accuracy of 85%,http method is more than 99%; In general, users of each cache server farm have a ratio of 2:1 to 3:1 of the amount of data accessed from the cache server to the original site, which is 50% to 70% To the original site repeatedly accessed data volume (mainly pictures, streaming media files and other content); For mirroring, the rest of the traffic, except for data synchronization, is done locally without accessing the original server.

The mirrored site (Mirror site) server is something that we can often see, which allows content to be distributed in a straightforward way, for both static and quasi-Dynamic Data synchronization. However, the cost of purchasing and maintaining a new server is high, and a mirror server must be set up in each region with professional technicians for management and maintenance. Large Web sites at any time to update the server at the same time, the demand for bandwidth will also increase significantly, so the general Internet companies will not establish too many mirror servers.

Caching means lower cost and is suitable for static content. Internet statistics show that more than 80% of users often visit the content of 20% of the site, under this rule, the cache server can handle the majority of customer static requests, and the original WWW server only need to process about 20% of non-cached requests and dynamic requests, This greatly accelerates the response time of customer requests and reduces the load on the original WWW server. As an important indicator of the CDN, the cache market is growing at a rate of nearly 100% a year and global turnover will reach $4.5 billion in 2004, according to IDC, a US survey. The development of network streaming media will stimulate the demand of this market.

  6. Network Architecture of CDN
CDN Network architecture mainly consists of two parts, center and Edge, the center refers to the Cdn Management Center and DNS Redirection resolution Center, responsible for global load balancing, equipment system installed in the Management Center room, edge mainly refers to the remote node, CDN distribution carrier, mainly by the cache and load balancer components.

When a user accesses a website that joins a CDN service, the domain name resolution request is eventually handed over to the global load Balancer DNS for processing. Global load Balancing DNS uses a predefined set of policies to provide the user with a node address that is closest to the user, enabling users to get a quick service. At the same time, it maintains communication with all CDNC nodes distributed around the world, collects the communication state of each node, ensures that the user's request is not assigned to the unavailable CDN node, and actually does global load balancing through DNS.

For ordinary Internet users, each CDN node is the equivalent of a web that is placed around it. With global load balancing DNS control, the user's request is transparently directed to the node closest to him, and the CDN server in the node responds to the user's request as if it were the site's original server. Because it is closer to the user, the response time must be faster.

Each CDN node consists of two parts: a load balancer device and a cache server

Load balancer is responsible for the load balancing of each cache in each node, which ensures the efficiency of the nodes, and the load balancing device is responsible for collecting the information of the nodes and the surrounding environment, maintaining the communication with the global load DNS, and realizing the load balance of the whole system.

The cache server (cache) is responsible for storing a large amount of information about a customer's website, responding to a local user's access request as if it were a Web server close to the user.

The management system of CDN is the guarantee that the whole system can operate normally. It can not only monitor the subsystems and equipment in real time, but also produce corresponding alarms for various faults, and can monitor the total traffic and the traffic of each node in real-time, and keep in the database of the system, so that the network manager can make further analysis conveniently. Through the perfect network management system, the user can modify the system configuration.

Theoretically, the simplest CDN network has a DNS that is responsible for global load balancing and one cache for each node to run. DNS supports resolving different IPs according to the user source IP address to achieve the nearest access. To ensure high availability, it is necessary to monitor the traffic, health status, etc. of each node. A node of a single cache bearer is not enough, only need more than one cache, more than one cache at the same time, only need a load balancer, so that the cache group work together.

7. CDN Example

The Commercial CDN Network is used for the service nature, high availability and so on the request very high, has the specialized product and the CDN Network solution, this article mainly from the theory angle, understands the CDN realization process, and uses already has the network environment and the open source software to do the actual disposition, the deeper understanding CDN's concrete work process.

Linux is an open source free operating system that has been successfully applied to many key areas. Bind is a well-known DNS service program on UNIX-like platforms such as Unix/freebsd/linux, with more than 60% of DNS running on the Internet as bind. The latest version of BIND is 9.x, with more is 8.x,bind 9 has a lot of new features, one of which is based on the user-side source address to the same domain name to resolve different IP addresses, with this feature, can be user access to the same domain name, directed to the different regional nodes of the server to access. Squid is a Linux operating system known as the cache engine, compared with the commercial cache engine, squid performance is relatively low, basic functions and business cache products are consistent, as a test, is very easy to configure to run up. The following is a brief introduction to the CDN configuration process.

1, to join the CDN Service website, need domain name (such as www.linuxaid.com.cn, address 202.99.11.120) resolution right to provide to the CDN operator, Linuxaid Domain name resolution record as long as the WWW host's a record to the CNAME and point to cache.cdn.com can be. Cache.cdn.com is the identity of the CDN network custom cache server. In the/var/named/linuxaid.com.cn domain name resolution record, by:

www             in      A       202.99.11.120
Switch
www in CNAME cache.cdn.com.

2, after the CDN operator obtains the domain name resolution right, obtains the domain name CNAME record, points to the CDN network under the cache server domain name, such as the global load-balancing DNS of the CACHE.CDN.COM,CDN network, the CNAME record needs to be resolved according to the policy IP address, usually gives the cache address of the nearest access.

Bind 9 Basic functions can be based on different source IP address segment resolution of the corresponding IP, to achieve the local access to the load balancer, generally through the BIND 9 sortlist option to return the nearest node IP address according to the client IP address, the specific process is:

1) set multiple A records for cache.cdn.com, the contents of/var/named/cdn.com are as follows:

$TTL 3600
@ in SOA ns.cdn.com. Root.ns.cdn.com. (
2002090201 ; Serial num
10800 ; Refresh after 3 hours
3600 ; Retry
604800 ; Expire
1800 ; Time to Live
)
In NS NS
www in A 210.33.21.168
NS in A 202.96.128.68
Cache in A 202.93.22.13 ; How many cache addresses are there?
Cache in a 210.21.30.90 ; how many cache A records are there?
Cache in A 211.99.13.47

2) The contents of the/etc/named.conf are:

Options {
Directory "/var/named";
sortlist {
#这一段表示当在本地执行查询时
#将按照202.93.22.13,210.21.30.90,211.99.13.47 Order return address
{localhost;
{localnets;
202.93.22.13;
{210.21.30.90; 211.99.13.47;};
};
};
#这一段表示当在202/8 address segment for DNS queries
#将按照202.93.22.13,210.21.30.90,211.99.13.47 Order return address
{202/8;
{202.93.22.13;
{210.21.30.90; 211.99.13.47;};
};
};
#这一段表示当在211/8 address segment for DNS queries
#将按照211.99.13.47,202.93.22.13,210.21.30.90 Order return address,
#也就是211.99.13.47 is the node closest to the query location
{211/8;
{211.99.13.47;
{202.93.22.13; 210.21.30.90;};
};
};
{61/8;
{202.93.22.13;
{210.21.30.90; 211.99.13.47;};
};
};
};
};
Zone "." {
Type hint;
File "Root.cache";
};
Zone "localhost" {
Type master;
File "localhost";
};
Zone "Cdn.com" {
Type master;
File "cdn.com";
};

3, cache in the CDN network if working in the server acceleration mode, because the configuration has already stated the URL of the accelerator server, so the cache directly matches the user request, to the source server to obtain the content and cache for the next use, if the cache is working in the client acceleration mode, The cache needs to know the IP address of the source server, so the CDN Network maintains and runs a DNS server for use by the cache, resolves the real IP address of the domain name, such as 202.99.11.120, and resolves records for each domain name as it did before joining the CDN network.

4, working in the CDN Network cache server must work in a transparent way, for squid, you need to set the following parameters:

Httpd_accel_host Virtual
Httpd_accel_port 80
Httpd_accel_with_proxy on
Httpd_accel_uses_host_header on

CDN (Content distribution network) technology principles

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.