The technical principle of CDN (content distribution network)
Source: It World Network release time: 2011-11-15 14:25 read: 309 times original link full screen reading [favorites]
The rapid development of the Internet, to people's work and life has brought great convenience, the Internet service quality and access to the speed of higher requirements, although the increasing bandwidth, the number of users is also increasing, by the Web server load and transmission distance and other factors, Slow response or frequent complaining and nagging. The solution is to use caching technology to make the Web service data flow near to the network transmission, it is an effective technology to optimize the network data transmission, so as to obtain high-speed experience and quality assurance.
The purpose of the network caching technology is to reduce the redundant data transmission in the network, minimize it, and convert the wide-area transmission to local or nearby access. The content that transmits on the Internet, most is duplicate web/ftp data, cache server and the network equipment that applies caching technology, can optimize data link performance greatly, eliminate the node device blocking caused by peak data access. Cache server has caching function, so most Web page objects (Web Page object), such as HTML, htm, PHP and other paging files, gif,tif, PNG, BMP and other image files, as well as other formats of files, in the validity (TTL), for repeated access, You do not have to transfer the file entities from the original web site, simply by using a simple authentication (freshness Validation)-to transfer a dozens of-byte header to the local copy directly to the visitor. Because caching servers are usually deployed close to the client side, the response speed of the approximate LAN can be obtained and the bandwidth consumption of wide area is reduced effectively. According to statistics, over 80% of the Internet users repeat access to 20% of the information resources, the application of caching technology to provide a prerequisite. The architecture of the caching server is different from that of the Web server, the caching server can achieve higher performance than the Web server, the caching server can not only increase the response speed, save the bandwidth, but also effectively reduce the load of the source server for accelerating the Web server.
Cache server is a highly integrated software and hardware professional function server, mainly to do cache acceleration services, generally deployed at the edge of the network. According to the acceleration of the object, divided into client acceleration and server acceleration, the client accelerated cache deployment at the network exit, the frequently accessed content cached locally, improve response speed and save bandwidth; server acceleration, cache deployed on the front of the server, as a Web server's predecessor, Improve the performance of your Web server and speed up access. If multiple cache Acceleration Server and distributed in different regions, it is necessary to manage the cache network through effective mechanism, guide users to visit the nearest, global load balanced traffic, this is the basic idea of CDN content transmission network.
2. What is a CDN.
The full name of CDN is content Delivery network, that is, contents distribution network. The goal is to add a new layer of network architecture to the existing Internet, publish the content of the Web site to the nearest user's network "edge", so that users can get the required content, solve Internet network congestion, improve user access to the Web site response speed. From the technical comprehensive solution because of the network bandwidth is small, the user accesses the quantity to be big, the dot distribution is not equal the reason, solves the user to visit the website The slow response speed the root reason.
In a narrow sense, the content distribution network (CDN) is a new type of network construction, which is specially optimized for the dissemination of broadband rich media in the traditional IP network, and in a broad sense, CDN represents a network service model based on quality and order. Simply put, Content Publishing Network (CDN) is a strategic deployment of the overall system, including distributed storage, load balancing, network request redirection and Content Management 4 elements, while content management and global network traffic Management (traffic Management) is the core of the CDN. Based on user proximity and server load judgments, CDN ensures that content is serviced in a very efficient manner for the user's request. In general, content services are based on a caching server, also known as proxy caching (surrogate), located on the edge of the network, away from the user only "hop" (Single Hop). At the same time, the proxy cache is a transparent mirror of the content provider's source server (usually located in the data center of the CDN service provider). Such a framework allows CDN service providers to represent their customers, the content provider, to provide the best possible experience to end users, who cannot tolerate any delay in response time. According to statistics, the use of CDN technology, can handle the entire Site page 70%~95% content access, reduce the pressure on the server, improve the performance and scalability of the site.
Compared with the existing Content release mode, CDN emphasizes the importance of network in content publishing. By introducing active Content management and global load balancing, CDN is fundamentally different from the traditional content release mode. In the traditional content release mode, the release of content is completed by the ICP Application server, and the network is only a transparent data transmission channel, this kind of transparency shows that the quality assurance of the network stays at the level of the packet, but cannot differentiate the service quality according to the different content objects. In addition, because the IP network's "best effort" characteristics make its quality assurance relies on the user and the application server end-to-end to provide sufficient, much more than the actual required bandwidth flux to achieve. In such a content release mode, not only a large number of valuable backbone bandwidth is occupied, and the ICP Application server load also becomes very heavy, and unpredictable. When some hot events occur and surge flow occurs, local hotspot effect is generated, which causes the application server to overload and exit the service. Another drawback of this central-based application server's content publishing model is the lack of personalized services and the distortions in the value chain of broadband services, and content providers are taking on content publishing services that they should not do or do poorly.
Throughout the value chain of broadband services, content providers and users are located at both ends of the entire value chain, and are connected by network service providers in the middle. With the maturity of the Internet industry and the change of business model, the role of this value chain is more and more broken down. such as content/application operators, managed service providers, backbone network service providers, access service providers, and so on. In this value chain of each role should be a division of work, their respective roles to provide customers with good service, and thus bring more win situation. From the combination of content and network mode, the content of the release has gone through the content of the ICP (application) server and IDC these two stages. The IDC boom has also spawned the role of the hosting service provider. However, IDC does not address the issue of effective release of content. Content is located in the center of the network does not solve the backbone bandwidth consumption and establish the traffic order on the IP network. So pushing content to the edge of the network and providing users with near-edge services ensures that the quality of the service and access to the entire network becomes an obvious choice. This is the Content publishing network (CDN) service model. The establishment of CDN solves the dilemma of "centralization and decentralization" of content operators. It is undoubtedly valuable and indispensable for constructing a good internet value chain.
3. CDN New applications and customers
The current CDN services are mainly used in securities, financial insurance, ISP, ICP, online transactions, portals, large and medium-sized companies, network teaching and other fields. In addition in the industry network, the Internet can be used, or even to the LAN for network optimization. Using CDN, these sites do not need to invest in expensive types of servers, set up a site, especially the wide application of streaming media information, distance teaching courseware, such as the consumption of bandwidth resources, such as media information, the use of CDN Network, the content of the network to replicate to the edge, so that the distance between the content request point and delivery point to the minimum, So as to promote the improvement of Web site performance, has important significance. CDN Network Construction mainly has the Enterprise Construction CDN Network, serves for the Enterprise, IDC's CDN Network, mainly serves the IDC and the value-added service, the network operation main built CDN network, mainly provides the content push service, the CDN Network service provider, specially constructs the CDN to do the service, The user cooperates with the CDN organization, the CDN is responsible for the information transmission work, guarantees the information normal transmission, maintains the transmission network, but the website only needs the content maintenance, no longer needs to consider the traffic question.
CDN can provide a guarantee for fast, safe, stable and scalable network.
IDC set up CDN Network, IDC operators generally need to have a division around the IDC center, the service object is hosted in the IDC Center customers, the use of existing network resources, less investment, easy to build. For example, IDC has 10 computer rooms, joined IDC CDN Network, hosted in a node of the Web server, equivalent to have 10 mirror servers, the nearest for customers to visit. Broadband metropolitan Area Network, the speed of the domain, the bandwidth will generally bottleneck, in order to reflect the high-speed experience of the metropolitan Area Network, the solution is the Internet content cache to the local, the cache deployed in the metropolitan area of the pop point, so as to form an efficient and orderly network, Users can access most of the content only one hop, which is also an acceleration of all Web site CDN applications.
4. The working principle of CDN
In describing the implementation principle of CDN, let us first look at the traditional access process without caching service, in order to understand the difference between CDN cache access mode and the way of not cached access:
As shown in the figure above, the process by which a user accesses an unused CDN cached Web site is:
1), the user to the browser to provide access to the domain name;
2), the browser calls the domain name analytic function library to parse the domain name, obtains this domain name corresponding IP address;
3), the browser uses the obtained IP address, the Domain Name Service host sends the data access request;
4), the browser based on the data returned by the domain name host to display the content of the Web page.
With the above four steps, the browser completes the process of receiving the domain name from the user to access the data from the Domain Name Service host. CDN Network is to increase the cache layer between users and servers, how to guide the user's request to the cache to obtain the source server data, mainly by taking over the DNS implementation, let us look at the access to use CDN cached Web site after the process:
From the diagram above, we can see that the access process of the Web site using the CDN cache becomes:
1), the user to the browser to provide access to the domain name;
2, the browser calls domain name Analysis Library to resolve the domain name, because the CDN to the domain name analysis process has been adjusted, so the analytic function library generally obtains is this domain name corresponding CNAME record, in order to obtain the actual IP address, the browser needs again to obtain the CNAME domain name to parse obtains the actual IP address In this process, the use of global load balancing DNS resolution, such as based on geographical information to resolve the corresponding IP address, so that users can visit nearby.
3, the resolution of the CDN cache server IP address, the browser to obtain the actual IP address, the cache server to send access requests;
4, the cache server according to the browser to provide access to the domain name, through the cache internal private DNS resolution to obtain the actual IP address of this domain name, and then by the cache server to the actual IP address to submit access requests;
5, the cache server from the actual IP address to get content, on the one hand, save in the local, for later use, two aspects of the data obtained back to the client, the completion of the data service process;
6, the client obtains the data that is returned by the cache server and then completes the entire browsing data request process.
Through the above analysis we can get, in order to realize both to the ordinary user transparent (that is, after adding a cache user Client does not need to make any settings, directly using the original domain name can be accelerated access to the site), but also in order to provide accelerated services for the specified site to reduce the impact of ICP As long as the entire access process to modify the domain name resolution to achieve transparent acceleration services, the following is the implementation of the CDN network of the specific operation process.
1, as the ICP, only need to interpret the domain name to the CDN operators, other aspects do not need to make any changes; when the operation, ICP modified its own domain name of the analytic records, generally with CNAME way to point to CDN Network cache server address.
2, as a CDN operator, the first need for the ICP Domain name to provide an open resolution, in order to achieve sortlist, is generally to the ICP Domain name interpretation results point to a CNAME record;
3, when the need for sorlist, CDN operators can use DNS to CNAME point to the domain name resolution process for special processing, so that the DNS server in the receipt of client requests can be based on the client's IP address, the same domain name to return the different IP address;
4, as a result of the IP address obtained from CNAME, and with hostname information, the request arrives cache, cache must know the source server IP address, so in the CDN operator internal maintenance of an internal DNS server, to explain the user access to the real IP address of the domain name;
5, in maintaining internal DNS server, also need to maintain an authoritative server, control which domain can be cached, and which do not cache, so as to avoid the situation of open agent.
5. The technical means of CDN
The main technical means of implementing CDN is cache and mirror server. Can work in DNS resolution or HTTP redirection in two ways, through the cache server, or offsite mirror site to complete the transfer of content and synchronization updates. DNS method User location accuracy rate is greater than 85%,http mode accuracy is more than 99%; In general, the cache server group's user access to the amount of data and cache server to the original site to take the content of the amount of data between 2:1 to 3:1, that is, sharing 50% to 70% To the original site to repeatedly access the amount of data (mainly pictures, streaming media files and so on); for mirroring, all but the data synchronization, the rest is done locally, without access to the original server.
The Mirror site (Mirror site) server is the one we can often see, which lets the content be distributed directly, and applies to static and quasi Dynamic Data synchronization. However, the cost of purchasing and maintaining a new server is high, and a mirror server must be set up in each region, with professional technicians for management and maintenance. Large Web sites at any time to update the local servers, while the demand for bandwidth will also increase significantly, so the general Internet companies do not create too many mirror servers.
The cost of caching is low and applies to static content. Internet statistics show that more than 80% of users often visit the content of 20% of the Web site, under which the caching server can handle the static requests of most customers, while the original WWW server only needs to handle about 20% of the non-cached and dynamic requests. This greatly accelerates the response time of the customer request and lowers the load on the original WWW server. According to IDC, an important indicator of CDN, the cached market is growing at a rate of nearly 100% a year, and global turnover will reach $4.5 billion in 2004. The development of network streaming media will stimulate the demand of this market.
6. The network architecture of CDN
CDN Network architecture mainly consists of two parts, divided into the center and the edge of two parts, the center refers to the CDN Network Management Center and DNS Redirect resolution Center, responsible for global load balancing, equipment system installed in the Management Center room, the edge mainly refers to the remote node, CDN distribution carrier, mainly by the cache and load balancer components.
When the user visits the website that joins CDN service, the domain name resolution request will finally give the global load balanced DNS to handle. Global load Balancing DNS provides users with the node address that is closest to the user by a predefined set of policies, enabling users to get fast service. At the same time, it also maintains communication with all CDNC nodes distributed around the world, collects the communication status of each node, ensures that the user's request is not allocated to the unavailable CDN node, and actually does global load balancing through DNS.
For ordinary Internet users, each CDN node is equivalent to a web that is placed around it. With global load balancing DNS control, the user's request is transparently directed to the nearest node, and the CDN server in the node responds to the user's request like the original server of the site. Because it is closer to the user, the response time is bound to be faster.
Each CDN node consists of two parts: a load-balancing device and a cache server
Load Balancing equipment is responsible for the load balance of each cache in each node to ensure the efficiency of the node, meanwhile, the load balancing device is also responsible for collecting the information of the node and the surrounding environment, maintaining the communication with the global load DNS, and achieving the load balance of the whole system.
The caching server (cache) is responsible for storing a large amount of information about the customer's Web site, as well as a Web server close to the user in response to a local user's access request.
The management system of CDN is the guarantee that the whole system can operate normally. It not only can carry on the real-time monitoring to each subsystem and the equipment in the system, to each kind of fault produces the corresponding alarm, but also may the real-time monitoring to the system total flow and each node's flow, and saves in the system database, causes the network management personnel to be able to carry on the further analysis conveniently. Through the perfect network management system, the user can modify the system configuration.
Theoretically, the simplest CDN network has a DNS that is responsible for global load balancing and a cache of each node, can run. DNS support resolves different IP addresses based on the user's source IP address and achieves the nearest access. In order to ensure high availability, we need to monitor the traffic and health status of each node. A node of a single cache load is not enough, only need more cache, many cache at the same time work, only need load balancer, so that the cache group work together.
7. CDN Example
The Commercial CDN Network is used in the service nature, high availability and other requirements are very high, there are professional products and CDN Network Solutions, this article mainly from the theoretical point of view to understand the implementation of CDN, and using the existing network environment and open source software to do the actual configuration, a more profound understanding of the specific work process of CDN.
Linux is an open source free operating system that has been successfully applied to many key areas. Bind is a very well-known DNS service program on Unix-like Unix/freebsd/linux, with more than 60% of DNS running on the Internet as bind. The latest version of BIND is 9.x, with more than 8.x,bind 9 has a lot of new features, one of which is based on the user's source address to the same domain name resolution of different IP addresses, with this feature, users can access the same domain name, guided to different geographical nodes of the server up to visit. Squid is Linux and other operating systems on the well-known cache engine, compared with the commercial cache engine, squid performance is relatively low, basic functional principle and commercial cache products are consistent, as a test, is very easy to configure the operation up. The following is a brief description of the CDN configuration process.
1, to join CDN Service website, need domain name (such as www.linuxaid.com.cn, address 188.8.131.52) resolution right to provide to CDN operator, Linuxaid's domain name resolution record simply changes the WWW host's a record to cname and points to cache.cdn.com. Cache.cdn.com is the identity of a custom cache server for a CDN network. In the/var/named/linuxaid.com.cn domain name resolution record, by:
www in A 184.108.40.206
www in CNAME cache.cdn.com.
2, the CDN operator obtains the domain name resolution right after, obtains the domain name the CNAME record, points to the CDN network cache server domain name, such as CACHE.CDN.COM,CDN network global load Balancing DNS, the need to CNAME records according to the strategy to resolve the IP address, is generally given the nearest cache address.
The basic functions of BIND 9 can be based on different source IP address segments to resolve the corresponding IP, according to the local access to the load balance, can generally be implemented by the Sortlist option of BIND 9 to return the nearest node IP address according to the client IP address, the specific process is:
1 set multiple A records for cache.cdn.com, the contents of/var/named/cdn.com are as follows:
@ in SOA ns.cdn.com. Root.ns.cdn.com. (
2002090201 ; Serial num
10800 ; Refresh after 3 hours
3600 ; Retry
604800 ; Expire
1800 ; Time to Live
In NS NS
www in A 220.127.116.11
NS in A 18.104.22.168
Cache in A 22.214.171.124 ; How many cache addresses are there?
Cache in a 126.96.36.199 ; how many cache A records are there?
Cache in A 188.8.131.52
2) The contents of/etc/named.conf are:
directory '/var/named ';
#将按照184.108.40.206,220.127.116.11,18.104.22.168 Order Returns the address
#这一段表示当在202/8 address segment for DNS queries
#将按照22.214.171.124,126.96.36.199,188.8.131.52 returns the address
#这一段表示当在211/8 address segment for DNS queries
#将按照184.108.40.206,220.127.116.11,18.104.22.168 The order return address,
# That is, 22.214.171.124 is the node closest to the query location