Self-built CDN defense against DDoS (2): architecture design, cost and deployment details
In the first article in this series, we introduced the situation of DDoS attacks on our customer service system and the reasons why we decided to use self-built CDN to solve this problem.
Next, we will introduce the specific construction plan of self-built CDN, mainly from the following aspects: hardware cost, bandwidth cost, architecture design, actual deployment.
Hardware cost
In terms of hardware, our demand for selection is strong performance on a 1U basis and cost-effective.
We chose the (strong oxygen) Twin Star server. Its hardware specification is 1U body + dual Xeon CPU + maximum 48 gb memory + dual gigabit network port x2 + H3C S1208 eight Gigabit ports, three-year warranty service, with a total price of about 15 thousand.
Bandwidth cost
The data center and bandwidth resources of a single data center are directly purchased from the operating agent because they do not need to be bundled by a third party. Therefore, the choice of a redundant location is high and cost-effective. For example, China Telecom and China Unicom single-line resources are rented. Each line provides Mbps of bandwidth and 8 IP addresses. Some data centers provide hardware protection to defend against 5-10 Gbps traffic.
Average cost, the bandwidth cost for each node is basically 1.6 ~ 25 thousand/year.
Architecture Design
The CDN architecture should fully reflect the anti-attack capability and flexible response principles. Therefore, we break down CDN nodes into three different functional structures: reverse proxy + cache acceleration + attack defense.
Reverse Proxy function (Role: Route acceleration, hiding the master node and Server Load balancer)
Cache acceleration (function: static push, saving the bandwidth of the backend master node)
AttacK Defense function (function: Fast parsing, matching and filtering malicious attacks)
There are a lot of software in the Open Source world that can act as reverse proxy and cache, and each has its own advantages and disadvantages. As an architect, we should consider how to select a model. We will compare and filter the performance, functions, and configurations.
Software
Name
Performance
Function
Filter Rule Configuration
Squid
Multi-core cannot be hard-wired
Disk cache capacity has advantages
Moderate Performance
Multiple Functions
Support ACL role control
It also supports the ICP Cache Protocol.
Read external rule files
Support hot loading
Support Hot Start
Varnish
Multi-core support
Memory Cache
High Performance
Adequate Functions
Clusters are not supported.
Supports backend survival check
External file reading is not supported
Escape
Support Hot Start
Nginx
Multi-core support
Support for proxy plug-ins
Strong performance
Multiple Functions
The plug-in can act as a multi-role Server
External file reading is not supported
Escape
Support Hot Start
ATS
Multi-core support
Disk/memory cache
High Performance
Adequate Functions
Support plug-in development
The ICP Protocol is also supported.
Read external rule files
Support hot loading
Support Hot Start
But lack of documentation
HAProxy
Multi-core support
No Cache
HTTP header supports syntax operations
High Performance
Fewer features
Only focus on HTTP header parsing and forwarding
Supports ACL role control and backend survival check
Read external rule files
Support hot loading
Support Hot Start
Supports session stickiness
Supports persistent connections
We have conducted test and optimization and production line tests on the three-layer functional structure, and evaluated from the following aspects:
HTTP defense performance: HAProxy only accounts for 10% of CPU consumption during regular expression matching and header filtering in response to high-traffic CC attacks ~ 20%. Other Software accounts for more than 90% of CPU resources, which easily leads to no response from the entire system.
Reverse Proxy performance: the forward efficiency is the highest in Varnish with memory cache, followed by ATS and Nginx. Considering the large-capacity cache, ATS is also a good choice, but there is a lack of documentation, continuous attention is required. Nginx is a product dedicated to C10K, with good performance. It is highly innovative with many plug-ins.
Configurable filtering rules: HAProxy, ATS, and Squid support rule File Reading, ACL customization, hot loading, and hot start. Nginx does not support regular expression matching of external files, which is slightly less compact but highly plasticity.
Therefore, based on the above considerations, our final architecture is a combination of HAProxy + Varnish/ATS/Nginx, that is, the defensive reverse proxy cache solution. The functional roles are as follows:
Previously, HAProxy was fully responsible for the separation of dynamic and static resources to achieve session stickiness, node load balancing, failover, and defense against Http-based CC attacks in case of critical events.
Backend pluggable and replaced Reverse Proxy Cache Engine: memory-type varnish or disk-type ats is determined based on the actual application scenarios on the production line and the cache object capacity, if you need to customize reverse proxy with strong functions (Anti-leech), such as Nginx + plugins.
The biggest feature of this combination is:
Supports reading external filter rules, especially key strings that can be directly appended to files without escaping.
Supports hot configuration file loading and reload.
Pluggable cache components flexibly respond to various business needs.
Simple deployment, easy to switch between failed and effective nodes.
LVS absent: Why is LVS not mentioned here, because LVS is a heavyweight, efficient, and stable layer-4 forwarding. It cannot be identified by layer-7 HTTP protocol, but can be fully set up before layer-7. Therefore, the use of LVS does not affect the network structure, and you can still consider it later, only on the premise that the single point of failure of LVS should be taken into account.
Actual deployment
Finally, we deployed a total of eight CDN nodes around the master node (the number of nodes is adjusted flexibly according to the company's strength and actual production environment requirements. This number is for reference only ), these nodes are divided into four regions by region: North (mainly in Shandong and Hebei), Southwest (mainly in Sichuan), East China (mainly in Ningbo and Jiaxing), South China (mainly in Fujian, mainly in Hunan.
Overall Cost
Eight single-line acceleration nodes, each of which is mx8 and eight Twin Star servers, with a total investment of about RMB (the subsequent costs are only for bandwidth expenditure, about RMB/year ), our emergency fund is RMB, and the monthly CDN budget is RMB.
Project schedule:
1 ~ 4-Month Progress: this feature is a quick start. Here is a tip: you can sign a contract with the IDC on a monthly or quarterly basis in the early stage, and then check the continuous node quality through monitoring. If the node quality is poor and the provider is changed, the loss will not be too great, if the node quality is good, you can pay for it by half a year or by one year. This ensures the highest quality and cost-effectiveness;
5 ~ 8 months is the final period: Based on the budget, add bandwidth at a certain pace, and ensure bandwidth redundancy;
A stable period after 8 months: the maximum availability of nodes is ensured based on actual conditions, and the overall defense capability is also improved.
How to implement protection policies
Enable the httplog function of HAProxy to record logs.
HAProxy Configuration Policy:
Global
Nbproc 24
Pidfile/var/run/haproxy. pid
Daemon
Quiet
User nobody
Group nobody
Chroot/opt/haproxy
Spread-checks 2
Ults
Log 127.0.0.1 local5
Mode http
Option forwardfor
Option httplog
Option dontlognull
Option nolinger # reduce FIN_WAIT1
Option redispatch
Retries 3
Option http-pretend-keepalive
Option http-server-close
Option accept-invalid-http-request
Timeout client 15 s
Timeout connect 15 s
Timeout server 15 s
Timeout http-keep-alive 15 s
Timeout http-request 15 s
Stats enable
Stats uri/stats
Stats realm 53KF \ Proxy \ Status
Stats refresh 60 s
Stats auth admin: adminxxx
Listen Web_FB 0.0.0.0: 80
Option HTTP chk GET/alive. php HTTP/1.0
Acl invalid_referer hdr_sub (referer)-I-f/opt/haproxy/etc/bad_ref.conf
Acl invalid_url url_sub-I-f/opt/haproxy/etc/bad_url.conf
Acl invalid_methods method-I-f/opt/haproxy/etc/bad_method.conf
Block if invalid_referer | invalid_url | invalid_methods
Acl dyn_host hdr (host)-I-f/opt/haproxy/etc/notcache_host.conf
Acl static_req path_end-I-f/opt/haproxy/etc/allow_cache_file.conf
Use_backend img_srv if static_req! Dyn_host
# Acl shaohy
Acl geek hdr_dom (host)-I 17geek.com
Use_backend geek if geek
# Backend shaohy
Backend geek
Mode http
Balance source
Cookie SESSION_COOKIE insert indirect nocache
Option tcpka
Server geek_1 FIG: 81 cookie geek_1 maxconn 10000 weight 8
Backend img_srv
Mode http
Option tcpka
Server img_srv 127.0.0.1: 88 maxconn 30000 weight 8
Varnish Configuration Policy:
Backend h_17geek_com_1 {
. Host = "127.0.0.1 ";
. Port = "81 ";
. Connect_timeout = 300 s;
. First_byte_timeout = 300 s;
. Between_bytes_timeout = 300 s;
}
Director geek srv {
{. Backend = h_17geek_com_1;. weight = 3 ;}
}
Sub vcl_recv {
If (req. http. host ~ "^ (Www ).? 17geek.com $ "){
Set req. backend = geek_srv;
If (req. request! = "GET" & req. request! = "HEAD "){
Return (pipe );
}
If (req. url ~ "\. (Php | jsp) ($ | \?) "){
Return (pass );
}
Else {
Return (lookup );
}
}
}
For CC DDoS attacks, the method for monitoring abnormal traffic described in the first article is still applicable, and the advantage is more obvious, because:
Each node undertakes corresponding log records, analyzes the system overhead of logs, and filters ACL rules on the haproxy frontend after detecting abnormal requests. Therefore, the attack pressure is not transmitted to the backend server, ensure backend security.
If the attack traffic on the node is too high, the data center can pull black IP addresses or divert traffic. The backend intelligent DNS will automatically remove the node, and subsequent requests will not pass the node.
In the next article in this series, we will introduce some subsequent improvements to the CDN architecture, including intelligent DNS, large-scale log analysis, and the use of OpenCDN to improve background management.