Self-built CDN defense against DDoS (3): subsequent improvements to the architecture

Source: Internet
Author: User
Tags haproxy egrep

In the first article in this series, we introduced the situation of DDoS attacks on our customer service system and the reasons why we decided to use self-built CDN to solve this problem.

Afterwards, we introduced the specific construction plan of self-built CDN, mainly from the following aspects: hardware cost, bandwidth cost, architecture design, and actual deployment.

This article is the third part of the "self-built CDN anti-DDoS" series, introducing the subsequent improvements to the CDN architecture. Subsequent improvements include DNS Intelligent Resolution + round robin + survival monitoring, centralized Log Analysis + attack defense, and rapid deployment and graphical management of multi-node CDN.

1. Intelligent DNS resolution + round robin + survival monitoring A. Deploy intelligent DNS to match the nearest CDN Node

Another purpose of self-built CDN is to optimize the access path, because these acceleration nodes are deployed after careful selection, indicators such as bandwidth quality, data center environment, and security risks can meet reliable and controllable requirements.

Therefore, after multiple CDN nodes are deployed, to make these nodes work collaboratively and optimize the user access path, you can specify the visitor IP address to the corresponding CDN node by configuring the View of the Bind so that the visitor can obtain the page content from the CDN node nearby based on the region and line type, to optimize the visitor's route.

B. automatic DNS round robin + Fault Monitoring

We can use DNS round robin to distribute loads to websites. If conditions are sufficient, redundant CDN nodes can be deployed in each region. This can not only relieve the load of a single node in a region, but also provide mutual backup for this node, when the CDN node in this region fails due to a fault, the scheduling mechanism can drag the traffic of the faulty node to the current available node in the shortest time to dynamically remove the node, this does not affect the normal request of the visitor.

To implement DNS round robin, you only need to add multiple A records for the same domain name in Bind. The Bind View function and node survival check related technologies are quite mature, and there are many technical documents. For details, refer to "using Bind to build a highly available intelligent DNS server". we will not go into detail here.

C. Bind View IP sorting script

The scripts we have compiled can help you quickly sort out the IP ranges of China Telecom and China Unicom lines, including China East, China South, China north, and China west. If you are interested, try them out.

# This script downloads the list of IP addresses in China from Apnic, and classifies them as China Unicom, China Telecom, and other IP addresses. get_apnic () {FILE = $ PWD/ip_apnic CNC_FILE = $ PWD/CNC CTC_FILE = $ PWD/ctc tmp =/dev/shm/ip. tmp rm-f $ FILE wget http://ftp.apnic.net/apnic/stats/apnic/delegated-apnic-latest -O $ FILE grep 'apnic | CN | ipv4 | '$ FILE | cut-f 4, 5-d' |' | sed-e's/| // G' | while read ip cnt do echo $ ip: $ cnt mask = $ (cat <EOF | bc | tail-1 pow = 32; define log2 (x) {if (x <= 1) return (pow ); pow --; return (log2 (x/2);} log2 ($ cnt) EOF) whois $ ip@whois.apnic.net> $ TMP. tmp sed-n'/^ inetnum/,/source/P' $ TMP. tmp | awk '(/mnt-/|/netname/)'> $ tmp netname = 'grep ^ netname $ TMP | sed-e's /. *:\(. *\)/ \ 1/G' | sed-e's /-. * // G' | sed's ::: g''egrep-qi "(CNC | UNICOM | WASU | NBIP | CERNET | CHINAGBN | CHINACOMM | FibrLINK | BGCTVNET | DXTNET | CRTC)" $ TMP if [$? = 0]; then echo $ ip/$ mask >>$ CNC_FILE else egrep-qi "(CHINATELECOM | CHINANET)" $ TMP if [$? = 0]; then echo $ ip/$ mask >>$ CTC_FILE else sed-n'/^ route/,/source/P' $ TMP. tmp | awk '(/mnt-/|/netname /) '> $ TMP egrep-qi "(CNC | UNICOM | WASU | NBIP | CERNET | CHINAGBN | CHINACOMM | FibrLINK | BGCTVNET | DXTNET | CRTC)" $ TMP if [$? = 0]; then echo $ ip/$ mask >>$ CNC_FILE else egrep-qi "(CHINATELECOM | CHINANET)" $ TMP if [$? = 0]; then echo $ ip/$ mask >>$ CTC_FILE else echo "$ ip/$ mask $ NETNAME" >>$ PWD/OTHER fi done rm-rf $ TMP. tmp} # extract address registrant address information from whois information to determine which province gen_zone () {FILE = $2 [! -S $ FILE] & echo "$ FILE file not found. "& exit 0 rm-rf $ FILE. zone while read LINE; do LINE = 'echo "$ LINE" | awk '{print $1} ''echo "$ LINE @" echo-n "$ LINE @" >>> $ FILE. zone whois $ LINE | egrep "address" | xargs echo >>$ FILE. zone sleep $ TIME done <$ FILE} # select the IP address list gen_area () {FILE = $2 [! -S $ FILE. zone] & echo "$ FILE. zone file not found. "& exit 0 STRING =" none "echo $ FILE | egrep-I-q" cnc "[$? = 0] & STRING = "cnc" echo $ FILE | egrep-I-q "ctc" [$? = 0] & STRING = "ctc" echo $ FILE | egrep-I-q "other" [$? = 0] & STRING = "other" [$ STRING = "none"] & echo "Not cnc or ctc file" & exit 0 cp-a $ FILE. zone $ FILE. tmp egrep-I "$ HD_STR" $ FILE. tmp> $ HD_FILE. $ STRING egrep-I-v "$ HD_STR" $ FILE. tmp> aaa mv aaa $ FILE. tmp egrep-I "$ HN_STR" $ FILE. tmp> $ HN_FILE. $ STRING egrep-I-v "$ HN_STR" $ FILE. tmp> aaa mv aaa $ FILE. tmp egrep-I "$ XI_STR" $ FILE. tmp> $ XI_FILE. $ STRING egrep-I-v "$ XI_STR" $ FILE. tmp> aaa mv aaa $ FILE. tmp egrep-I "$ HB_STR" $ FILE. tmp> $ HB_FILE. $ STRING egrep-I-v "$ HB_STR" $ FILE. tmp> aaa mv aaa $ FILE. tmp grep ^ [0-9] $ FILE. tmp | awk '{print $1}'> $ HD_FILE. $ STRING sed-r-I's #@. * ## G '*. $ STRING rm-rf $ FILE. tmp}

You can download the specific script at https://github.com/shaohaiyang/easymydns.

2. Centralized Log Analysis + AttacK Defense

As the front node of the website, CDN records the access behavior of all visitors in real time. It can be said that logs contain a variety of mysteries. It is understood that most websites do not make good use of their access logs, but only archive and backup them. If you can make good use of these access logs and perform in-depth analysis and mining on these logs, it will be of great help to understand the website running status and detect abnormal activities at the business layer. In particular, in the face of DDoS attacks, sufficient evidence can be provided to distinguish malicious IP addresses.

The main types of distinguishing malicious attacks are as follows:

  • A certain IP address initiates a large number of concurrent requests
  • A large number of consecutive IP segments initiate requests
  • Initiate requests from a large number of non-Rule IP addresses

Currently, our log analysis on HAProxy only applies to a single node. in actual application scenarios, we use Log truncation per unit of time to write logs to/dev/shm memory, the common shell, awk, and sed languages are used for behavior analysis. This avoids the short board of disk I/O overhead. The disadvantage is that the log analysis behavior is rough and the analysis efficiency needs to be improved.

A. Multi-node CDN centralized Log Analysis + attack blocking Architecture

The Log Analysis Architecture acting on a single node has many limitations, mainly including:

  • Logs are scattered across nodes. The data of other nodes is ignored during analysis and the global conditions cannot be known.
  • When the protection rule is enabled, it only acts on a single node, and other nodes still face attacks with this feature.
  • Real-Time Analysis of A Single Node occupies large system resources when facing attacks

Therefore, in a multi-node CDN architecture, if you need to detect and block DDoS attacks in a timely manner, and consider using node system resources with as little overhead as possible, the attack behavior must be centrally analyzed at the global level, and multi-node coordinated defense/blocking rules should be launched for the analysis results to cope with DDoS attacks.

After sorting out the difficulties, we found that the following three problems should be solved:

  1. Collects massive log storage on multiple CDN nodes
  2. Centralized Risk Analysis for massive logs
  3. Coordinated attack blocking Mechanism
Specific architecture:
  • Nginx/HAProxy as the terminal to defend against attack systems
  • Access logs generated by nodes are sent to the dedicated LogServer through syslog for collection.
  • Dedicated LogServer is used for log storage and Risk Analysis and blocking rule push.
A. HAProxy/Nginx is used as a carrier to defend against attacks.

As mentioned in the previous article, we recommend that you use HAProxy or Nginx as a defensive reverse proxy on the CDN node to flexibly develop ACL filtering rules for attack defense, and can take effect in real time in hot loading mode.

B. Solutions for log storage

This step consists of two parts: one is the log transmission from the node to the LogServer, and the other is the centralized storage of logs at the LogServer end. Logs generated by CDN nodes can be summarized to the dedicated LogServer by locally writing PIPE + Rsyslog UDP for transmission. After the LogServer receives the logs, logs are stored together by domain name classification.

Hadoop can be used as the carrier for storing massive logs. The Map/Reduce algorithm can be used to decompose logs to improve the filtering efficiency. For more information, see open-source log system comparison.

C. coordinated attack blocking Mechanism

Here is the most critical link: the focus of our entire architecture is "anti-attack". After the previous analysis, we have defended against multi-node CDN attacks. The most efficient approach is: A dedicated LogServer performs centralized Analysis and Computation, generates security protection policies based on the calculation results, and connects them to each CDN node in real time to coordinate defense and blocking rules to cope with DDoS attacks. The following problems will occur:

  1. What types of scripts and rules are used to analyze logs?
  2. How does the analysis result form an ACL Policy for HAProxy/Iptables?
  3. How the generated ACL Policy applies to global CDN nodes and forms a linkage

Our design philosophy is as follows:

After logs are completely stored on the LogServer, use the analysis script to perform feature matching, extract the source IP addresses of malicious attacks, and generate the corresponding HAProxy/Iptables blocking rules for these IP addresses, and distributed to the global CDN node. You can do this in two ways:

  1. Associate with Iptables and Nginx/HAProxy through dedicated interfaces
  2. The unified configuration management tool Puppet is used to push messages. LogServer acts as the message pushing end and command issuing master end, and each CDN node acts as the policy receiving end and effective command execution end, after receiving the protection policy, the system automatically adds the ACL list and runs the Hot load command.
B. Advantages of the Architecture
  • After the implementation of this architecture, the horizontal scaling of the system will become very easy. The CDN node can be dynamically added or removed based on the node traffic/Resource load, without any changes to the source site.
  • It can easily cope with DDoS attacks and automatically block attack sources while distributing attack traffic.
  • In addition, if an exception is detected on a website, you can quickly develop new protection rules and apply the blocking measures to all sites added to the CDN for global security protection.
  • Collect and analyze the logs on each CDN node to obtain detailed access behaviors of all users and record all illegal access behaviors, by preparing business security rules, pre-warning and post-event tracking can be provided.
3. Fast deployment and graphical management of multi-node CDN

Managing and maintaining a set of CDN systems is a great challenge for any organization, especially the deployment of multi-regional and multi-line CDN. You need to control the node list of CDN acceleration at any time, define which web elements can be used as cache, And what ACL policies are required. These require professional system O & M personnel to configure and implement them.

Generally, a mature approach is to configure CDN rules in advance through the master machine and push the configuration file to each CDN node through Rsync. Obviously, although this solution is highly efficient, it has a certain threshold for CDN deployers. In addition, the server's permission control requirements are very strict, which is not conducive to promotion for other engineers.

By chance, we were lucky to have first recognized OpenCDN as an award in the hackathon competition. Through complementary integration, we made up for the lack of front-end management on our CDN. Therefore, IT can be deeply integrated with the OpenCDN project to lower the O & M and Management thresholds and benefit more it o & M users.

A. What problems does OpenCDN mainly solve?

OpenCDN is a tool for rapid deployment of CDN acceleration. It provides a convenient management platform for enterprises that provide CDN acceleration services or that require multi-node CDN acceleration, monitors and manages the status and system load of each node in real time. OpenCDN pre-fabricated multiple sets of common cache rules, supporting a variety of complex CDN cache scenarios. As its name suggests, OpenCDN is free and open-source.

B. How is OpenCDN currently implemented?

The main architecture of OpenCDN can be divided into CDN management center and CDN acceleration nodes. There can be many CDN acceleration nodes, but there is no limit on the quantity. Users can quickly deploy multiple CDN acceleration nodes through OpenCDN and perform centralized management through a management center.

Therefore, OpenCDN mainly implements two parts here: one is to integrate the CDN node deployment process with one click, and the other is to centrally manage these CDN nodes through the WebConsole tool.

C. What will OpenCDN do in the future? What is the effect?

OpenCDN is designed to accelerate websites with multiple CDN nodes. It provides a convenient CDN acceleration management platform that allows users to build self-built CDN nodes on demand, flexibly control costs, and improve website response speed, quick Response to traffic spikes.

In the future, we will integrate the above-mentioned CDN combination scheme to defend against large-volume DDoS attacks. We have made open source for this platform and hope that more people who need it can obtain it at the lowest cost. At the same time, we hope that more developers can join us to complete it. Everyone is me, and everyone is me.

D. Advantages of OpenCDN for self-built CDN
  1. The first is to reduce the cost of obtaining CDN, and the most important thing is to improve the performance of CDN nodes. Compared with renting a commercial CDN, we do not need to calculate the cost for purchasing traffic and create a leasing mode with fixed overhead.
  2. It is not limited to node media. Physical servers or VPS can be used. You can use the VPS of different service providers to build a low-cost CDN acceleration cluster covering the whole country.
  3. Commercial CDN nodes must be shared to multiple sites for simultaneous use, which means that the limited resources (concurrency) of the nodes will be shared and used at the same time, users with high bandwidth/traffic requirements are more suitable for self-built architectures.
Which users does OpenCDN apply?

Currently, OpenCDN is applicable to highly competitive websites in the industry:

Game stations, vertical e-businesses, community forums, online videos, and chats.

Common Characteristics of these websites: Medium Traffic, fierce competition, frequent attacks, high profits in the industry, and willingness to spend money.

Summary

So far, the "self-built CDN anti-DDoS" series has come to an end. If you have any questions, please contact us.

Author Profile

Hu Haiyang, from the Linux User Group in Hangzhou. The website is named "Heart of the ocean", a System Architect and amateur contributor. It is devoted to the research and exploration of open source software and cutting-edge technologies.

Zhang Lei, from the Google developer community in Hangzhou. Focusing on the information security technology field, he has led a number of website Security Testing and intrusion forensics analysis projects in the banking/securities industry to provide security protection technical support for the four major banks. Currently, entrepreneurs are engaged in Internet Security Protection

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.