In-depth analysis: technical principle of CDN content delivery network-reprinted

Last Update:2014-07-10 Source: Internet

Author: User

Tags transparent image website server

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Preface

The rapid development of the Internet has brought great convenience to people's work and life. The demand for Internet service quality and access speed is getting higher and higher, the number of users is also increasing. due to factors such as the load and transmission distance of web servers, slow response speed is still frequently complained and troubled. The solution is to use the Cache Technology in network transmission so that Web Service data flows can be accessed nearby. It is a very effective technology to optimize network data transmission, so as to achieve high-speed experience and quality assurance.

The purpose of the network cache technology is to reduce the redundant data transmission in the network and minimize it, and convert the wide-area transmission to local or nearby access. Most of the content transmitted on the Internet is repeated Web/FTP data, cache servers and network devices using the caching technology, which can greatly optimize the performance of data links, eliminate node device congestion caused by peak data access. The cache server has the cache function, so most webpage objects (web page objects), such as HTML, htm, PHP and other page files, GIF, Tif, PNG, BMP and other image files, for files in other formats, during the validity period (TTL), for repeated accesses, you do not have to re-transfer the file entity from the original website, but only need to pass the simple authentication (freshness validation) -Send dozens of bytes of headers to directly send local copies to visitors. Because the cache server is usually deployed near the user end, the response speed of the local area network can be obtained and the bandwidth consumption can be effectively reduced. According to statistics, over 80% of users on the Internet repeatedly access 20% of information resources, which provides a prerequisite for the application of cache technology. The architecture of the cache server is different from that of the Web server. The Cache Server delivers higher performance than the Web server. The cache server not only improves the response speed, but also saves bandwidth, it is very effective to effectively reduce the load on the source server.

The high-speed cache server is a professional function server highly integrated with software and hardware. It is mainly used for high-speed cache acceleration services and is generally deployed on the network edge. Different acceleration objects are divided into client acceleration and server acceleration. The client acceleration cache is deployed at the network exit and the frequently accessed content is cached locally to increase the response speed and save bandwidth. Server acceleration, the cache is deployed on the front-end of the server and serves as the front-end server of the Web server. This improves the performance of the web server and speeds up access. If multiple cache acceleration servers are distributed across different regions, You need to effectively manage the cache network to guide users to access the nearest node and perform global load balancing. This is the basic idea of CDN content transmission network.

2. What is CDN?

CDN stands for content delivery network (CDN. The purpose is to add a new network architecture to the existing Internet to publish website content to the "edge" closest to the user's network, so that users can obtain the desired content nearby, solves Internet network congestion and increases the response speed for users to access websites. Technically, this solution comprehensively solves the root cause of slow response speed when users access the website due to low network bandwidth, large user visits, and unevenly distributed outlets.

In a narrow sense, the content delivery network (CDN) is a new type of network construction mode, which is a network covering layer that is particularly optimized for publishing broadband rich media on the traditional IP network; in a broad sense, CDN represents a network service model based on quality and order. To put it simply, content delivery network (CDN) is a strategically deployed overall system, which includes four requirements: distributed storage, Server Load balancer, network request redirection, and content management, content management and global network traffic management are the core of CDN. Based on users' proximity and server load judgment, CDN ensures that the content provides services for users' requests in an extremely efficient manner. In general, the content service is based on a cache server, also known as a proxy cache (surrogate). It is located on the edge of the network and is only "One hop" away from the user. At the same time, the proxy cache is a transparent image of the content provider source server (usually located in the data center of the CDN service provider. This architecture enables CDN service providers to provide end users with the best possible experience on behalf of their customers, that is, content providers. These users cannot tolerate any latency in request response time. According to statistics, CDN technology can be used to process 70%-of the entire website page ~ 95% of content access Traffic reduces the load on the server and improves the performance and scalability of the website.

Compared with the existing content publishing mode, CDN emphasizes the importance of the network in content publishing. By introducing active content management and global load balancing, CDN is fundamentally different from the traditional content publishing mode. In the traditional content publishing mode, content publishing is completed by the ICP application server, while the network is only a transparent data transmission channel, this transparency is manifested in the quality assurance of the network, which only stays at the layer of data packets, but cannot distinguish the quality of service based on different content objects. In addition, due to the "Best Effort" feature of the IP network, the quality assurance is achieved by providing sufficient end-to-end bandwidth between the user and the application server, far greater than the actual needs. In this content publishing mode, not only is a large amount of valuable backbone bandwidth occupied, but the load of the ICP application server also becomes very heavy and unpredictable. In the event of some hot events and surge traffic, local hot spots will be generated, so that the application server will be overloaded and quit the service. Another drawback of this center-based app server's content publishing model is the lack of personalized services and the distortion of the broadband service value chain, the content provider undertakes the content publishing services that they should not or are not doing well.

Throughout the value chain of broadband services, content providers and users are located at both ends of the entire value chain, and network service providers are used in the middle to connect them. With the maturity of the Internet industry and the transformation of business models, more and more roles in this value chain are becoming increasingly subdivided. Such as content/Application operators, managed service providers, backbone network service providers, and access service providers. In this value chain, each role must work in a division of labor and perform their respective duties to provide good services to customers, resulting in a win-win situation. From the perspective of the combination of content and network, content publishing has gone through two stages: content (Application) server and IDC. The IDC boom also gave birth to the role of hosting service providers. However, IDC cannot solve the issue of effective content publishing. The content in the center of the network does not address the occupation of backbone bandwidth and the traffic order on the IP network. Therefore, pushing content to the edge of the network to provide nearby edge services for users, thus ensuring the service quality and the access order on the entire network becomes an obvious choice. This is the CDN service mode. The establishment of CDN solves the dilemma of "centralization and decentralization" for content operators. It is undoubtedly valuable and indispensable for building a good Internet value chain.

3. New CDN applications and customers

The current CDN service is mainly used in securities, financial insurance, ISP, ICP, online trading, portal websites, large and medium-sized companies, network teaching and other fields. In addition, it can be used in industry private networks, the internet, or even LAN optimization. Using CDN, these websites do not need to invest in expensive servers, set up sub-sites, especially the wide application of streaming media information, remote teaching courseware, and other media information that consumes a lot of bandwidth resources, and use CDN networks, copying content to the edge of the network minimizes the distance between the content request point and the delivery point, thus promoting the improvement of the web site performance. CDN networks are mainly built by enterprises to serve enterprises. CDN networks of IDCs are mainly used for IDCs and value-added services. CDN networks built on network operations are mainly used for CDN networks, it mainly provides content push services. CDN network service providers and specially constructed CDN are used for services. Users cooperate with CDN institutions to transfer information and ensure normal information transmission, the website only needs content maintenance and does not need to consider traffic issues.

CDN can guarantee the speed, security, stability, and scalability of the network.

IDC establishes a CDN network. IDC operators generally need to have multiple IDCs located in different regions. The service targets customers hosted in IDCs. using existing network resources, there is less investment, easy to build. For example, if an IDC has 10 data centers in China and is added to the CDN network of the IDC, a Web server hosted on a node is equivalent to 10 backup servers, which are accessible to customers nearby. Broadband man, the speed of the intra-domain network is very fast, and the out-of-town bandwidth is generally a bottleneck. To reflect the high-speed experience of the man, the solution is to cache the content on the Internet to a local device at a high speed, deploy the cache on various pop points in the Metropolitan Area Network to form an efficient and orderly network. Users can access most of the content in one hop. This is also an application to accelerate CDN for all websites.

4. How CDN works

After describing the implementation principle of CDN, Let's first look at the access process of the traditional cache service, so that we can understand the differences between the CDN cache access method and the cache access method:

As you can see, the process for users to access websites that are not cached using CDN is as follows:

1) The user provides the domain name to be accessed to the browser;

2) the browser calls the domain name resolution function library to resolve the domain name to obtain the corresponding IP address of the domain name;

3) when the browser uses the obtained IP address, the service host of the domain name sends a data access request;

4) the browser displays the webpage content based on the data returned by the domain name host.

Through the above four steps, the browser completes the whole process from receiving the domain name accessed by the user to obtaining data from the domain name service host. The CDN network adds a cache layer between the user and the server. How to direct users' requests to the cache to obtain data from the source server is mainly achieved by taking over DNS, let's take a look at the process of accessing the website cached by CDN:

We can see that the website access process after CDN cache is changed:

1) The user provides the domain name to be accessed to the browser;

2) the browser calls the domain name resolution library to resolve the domain name. Because CDN has adjusted the domain name resolution process, the resolution function library generally obtains the cname record corresponding to the domain name, to obtain the actual IP address, the browser needs to resolve the obtained cname domain name again to obtain the actual IP address. In this process, the Global Server Load balancer DNS resolution is used, for example, you can resolve the corresponding IP address based on the geographic location information so that users can access the IP Address nearby.

3) the IP address of the CDN Cache Server is obtained through this resolution. After obtaining the actual IP address, the browser sends an access request to the cache server;

4) The Cache Server obtains the actual IP address of the domain name based on the domain name to be accessed provided by the browser through private DNS resolution within the cache, and then submits access requests to the actual IP Address by the cache server;

5) after the Cache Server obtains the content from the actual IP address, it saves the content locally for future use, and returns the obtained data to the client to complete the data service process;

6) after the client obtains the data returned by the cache server, it displays the data and completes the browsing data request process.

Through the above analysis, we can see that in order to achieve transparency to normal users (that is, after the cache is added, the user client does not need to make any settings, and can directly access the original Domain Name of the accelerated website ), in addition, when providing acceleration services for a specified website and reducing the impact on ICP, you only need to modify the domain name resolution Section during the entire access process to achieve transparent acceleration service, the following describes how to implement the CDN network.

1) As an ICP, you only need to give the domain name Interpretation Right to the CDN operator, and do not need to make any changes to other aspects. During the operation, the ICP modifies the resolution record of your domain name, generally, cname is used to point to the address of the CDN network cache server.

2) as a CDN carrier, you must first provide public resolution for the ICP domain name. To achieve sortlist, the ICP domain name interpretation result is generally directed to a cname record;

3) When sorlist is required, the CDN operator can use DNS to perform special processing on the domain name resolution Process pointed to by cname, so that the DNS server can receive client requests according to the IP address of the client, returns different IP addresses of the same domain name;

4) because the IP address obtained from cname carries the Hostname Information, after the request arrives at the cache, the cache must know the IP address of the source server. Therefore, the CDN operator maintains an internal DNS server, it is used to explain the real IP address of the domain name accessed by the user;

5) when maintaining the internal DNS server, you also need to maintain an authorization server to control which domain names can be cached and which do not, so as to avoid opening the proxy.

5. CDN technical means

The main technical means for implementing CDN are high-speed cache and backup storage. It can be used in DNS resolution or HTTP redirection to transfer and synchronously update content through the cache server or remote mirror site. The accuracy of DNS user location determination is greater than 85%, and that of HTTP is more than 99%. Generally, the ratio of the incoming data volume accessed by users in each cache server group to the data volume retrieved from the cache server to the original website is between and, that is, the amount of data that is repeatedly accessed from the original website (mainly images and streaming media files) is shared between 50% and 70%. For images, all the data except Data Synchronization traffic is locally completed, do not access the original server.

The Mirror Site Server is often seen. It allows the content to be directly distributed and is suitable for static and quasi-dynamic data synchronization. However, the cost for purchasing and maintaining new servers is high. In addition, you must set up image servers in various regions and assign professional technicians for management and maintenance. While large websites are updating their servers at any time, their demand for bandwidth will also increase significantly. Therefore, Internet companies generally do not have to create too many backup storage servers.

High-speed cache is low in cost and suitable for static content. Internet statistics show that more than 80% of users frequently access 20% of website content. Under this rule, the cache server can process static requests of most customers, the original WWW server only needs to process about 20% of non-cache requests and dynamic requests, which greatly accelerates the response time of customer requests and reduces the load of the original WWW server. According to a survey by us idc, as an important indicator of CDN, the cache market is growing at nearly 100% every year, and the global turnover will reach $2004 in 4.5 billion. The development of network streaming media will also stimulate the market demand.

6. CDN network architecture

The CDN network architecture consists of two parts: the center and the edge. The center refers to the CDN network management center and the DNS redirection resolution center, which is responsible for global load balancing, and the equipment system is installed in the management center data center, edges mainly refer to remote nodes and CDN delivery carriers, which are mainly composed of cache and Load balancer.

When a user accesses a website that joins the CDN service, the domain name resolution request is finally sent to the global Server Load balancer DNS for processing. Global Load Balancing DNS provides the Node Address closest to the user through a set of pre-defined policies, so that the user can get quick services. At the same time, it maintains communication with all cdnc nodes distributed around the world, collects the communication status of each node, and ensures that user requests are not distributed to unavailable CDN nodes, in fact, global load balancing is implemented through DNS.

For Internet users, each CDN node is equivalent to a web node placed around it. Through Global Server Load balancer DNS control, users' requests are transparently directed to the nearest node. The CDN server in the node will respond to users' requests like the original server of the website. Because it is closer to the user, the response time must be faster.

Each CDN node consists of two parts: the Server Load balancer device and the high-speed cache server.

The server Load balancer device is responsible for load balancing of each cache in each node to ensure node efficiency. At the same time, the Server Load balancer device is also responsible for collecting information about nodes and the surrounding environment, maintain communication with global load DNS to achieve Load Balancing for the entire system.

The high-speed cache server stores a large amount of information on the customer's website, just like a website server close to the user, responding to access requests from local users.

The CDN management system ensures the normal operation of the entire system. It not only monitors all subsystems and devices in the system in real time, generates corresponding alarms for various faults, but also monitors the total traffic in the system and the traffic of each node in real time, and stored in the system database, so that the network administrator can easily perform further analysis. With a complete network management system, you can modify system configurations.

Theoretically, the simplest CDN network has a DNS responsible for global load balancing and a cache for each node to run. DNS supports resolution of different IP addresses based on users' source IP addresses for nearby access. To ensure high availability, You need to monitor the traffic and health status of each node. When the number of individual caches on a node is insufficient, multiple caches are required. When multiple caches work at the same time, a server Load balancer is required to enable the cache group to work collaboratively.

7. CDN example

The commercial CDN network is used for services with high availability and other requirements. It provides professional products and CDN network solutions. This article mainly analyzes the implementation process of CDN from a theoretical perspective, it also uses the existing network environment and open-source software for actual configuration to better understand the specific working process of CDN.

Linux is a free open-source operating system and has been successfully applied to many key fields. BIND is a well-known DNS service program on UNIX, FreeBSD, Linux, and other UNIX platforms. Over 60% of DNS on the Internet runs bind. The latest version of BIND is 9.x. the latest version of BIND is 8.x. BIND 9 has many new features. One of these features is to resolve different IP addresses for the same domain name based on the source address of the user, you can direct your access to the same domain name to servers in different regions. Squid is a well-known Cache Engine in Linux and other operating systems. Compared with commercial cache engines, squid has low performance and works in the same way as commercial cache products, is very easy to configure and run. The following describes the CDN configuration process.

1. For a website to be added to the CDN service, domain name resolution (such as www.linuxaid.com.cn, address 202.99.11.120) is required for the CDN service provider, for the domain name resolution record of linuxaid, you only need to change the record of the WWW host to cname and direct it to cache.cdn.com. Cache.cdn.com is the identifier of the cache server customized by the CDN network. In the/var/named/linuxaid.com.cn domain name resolution record,:

WWW in a 202.99.11.120
Change
WWW in cname cache.cdn.com.

2. After the CDN operator obtains the domain name resolution right, it obtains the cname record of the domain name, pointing to the domain name of the cache server under the CDN network, such as cache.cdn.com, and the global Load Balancing DNS of the CDN network, you need to resolve the cname record to an IP address based on the policy. Generally, the IP address of the nearest access is provided.

The basic functions of BIND 9 can resolve the corresponding IP addresses based on different source IP address segments to achieve Load Balancing Based on the proximity of the region, generally, you can use the sortlist option of BIND 9 to return the nearest node IP address based on the Client IP address. The specific process is as follows:

1) set multiple a records for cache.cdn.com. The content of/var/named/cdn.com is as follows:

$ TTL 3600 @ in SOA ns.cdn.com. root.ns.cdn.com. (2002090201; Serial num 10800; Refresh after 3 hours 3600; retry 604800; expire 1800; time to live) in NS nswww in a 210.33.21.168ns in a 202.96.128.68cache in a 202.93.22.13; cache in a 210.21.30.90; cache in a 211.99.13.47

2) The content in/etc/named. conf is:

Options {directory "/var/named"; sortlist {# This section indicates that when a local query is executed # The address {localhost; {localnets; will be returned in the order of 202.93.22.13, 210.21.30.90, and 211.99.13.47; 202.93.22.13; {210.21.30.90; 211.99.13.47 ;};};# this section indicates that when the DNS query is performed in section 202/8 # It will follow 202.93.22.13, 210.21.30.90, in the order of 211.99.13.47, the return address {202/8; {202.93.22.13; {210.21.30.90; 211.99.13.47 ;};};# this section indicates that when you perform DNS query in the 211/8 segment #211.99.13.47, 202.93.22.13. The sequence return address, # That is, 211.99.13.47 is the node closest to the query location {211/8; {211.99.13.47; {202.93.22.13; 210.21.30.90 ;};};{{ 61/8; {202.93.22.13; {210.21.30.90; 211.99.13.47 ;};};};}; zone ". "{type hint; file" root. cache ";}; zone" localhost "{type master; file" localhost ";}; zone" cdn.com "{type master; file" cdn.com ";};

3. If the cache works in the server acceleration mode in the CDN network, because the URL of the CDN server is specified in the configuration, the cache directly matches the user request, obtain the content from the source server and cache it for next use. If the cache works in client acceleration mode, the cache needs to know the IP address of the source server, therefore, the CDN network maintains and runs a DNS server for cache and resolves the real IP address of the domain name, such as 202.99.11.120. The resolution records of each domain name are the same as those before they are added to the CDN network.

4. the Cache Server working in the CDN network must work in a transparent manner. For squid, you need to set the following parameters:

httpd_accel_host virtualhttpd_accel_port 80httpd_accel_with_proxy onhttpd_accel_uses_host_header on

Original article: http://www.it.com.cn/f/server/076/21/433995.htm

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More