As we all know, DNS, as a basic Internet service, plays a vital role in the normal operation of the entire internet. Of course, the attacker with ulterior motives understands this truth and always wants to use various attack methods to disrupt the normal development of the DNS resolution service. Since April this year, DNSPod has successfully protected more than 400 domain name attacks, providing users with high-quality domain name resolution services.
Every engineer of DNSPod has been thinking about how to comprehensively operate DNS services at different levels to ensure secure and efficient operation. We believe that we should start from the following aspects:
Status Monitoring
The DNS service is a service with high real-time requirements. An accurate and comprehensive monitoring system is the basis for the operation of the entire DNS service. Therefore, we have designed a complete monitoring system, including network traffic monitoring, server kernel monitoring module, resolution monitoring, and server cluster monitoring. The DNS resolution service is monitored from different perspectives to ensure that engineers can understand the running status at the first time. In terms of technology selection, we adopt mature SNMP-based nagios/cacti monitoring, and a monitoring module that closely integrates DNS development and resolution services, meet the needs of different monitoring objects.
Information alert
There will always be various situations in the DNS Service running process. The same event needs to be notified to different owners, and the information everyone needs to know is different. For example, after a domain name attack event is captured, an alert is immediately sent to the O & M engineer prompting for traffic data at various levels. Send a Summary of the attack situation and the extent of the impact to the technical support staff so that the user can obtain the latest information when asking about the situation. Targeting VIP customers, the sales staff will also send attack-related data and handling information to relevant sales staff, and the sales staff will directly contact the customer. A particularly important attack event will also be sent to market personnel, developers, Technical Directors and even general managers to ensure timely delivery of information and timely handling of the event. To meet diversified information sending requirements, we have established a special notification system platform and provided consistent API interfaces for various programs to call, it provides email, SMS, voice, and other notification methods.
Event Processing
In order to respond to and handle various incidents in a timely manner and provide continuous quality services to users, we implement a 24-hour duty system. Experienced technicians are ready to respond to emergencies at any time. At the same time, in order to further strengthen the response efficiency, automated O & M processing is essential. For example, we have made long-term research on DNS attacks and developed various protection methods, such as domain name blocking/unblocking, protection algorithms, and traffic guidance, which are automatically enabled based on the actual situation of DNS attacks, it can resolve high-traffic DNS attacks in a short time to minimize the impact.
Data Records
Of course, the completion of event processing does not mean that the process is over. Various records need to be recorded to ensure review and analysis. Basic data includes switch traffic data, Nic packet capture data, and event processing records. We have completely recorded, backed up, sorted, and archived these data, in this way, not only can all problems be traced, but also prepares for further statistical analysis. Because of the large amount of data and various types of data, we use Redis and MongoDB. The features of its NoSQL storage are especially suitable for this situation.
Comprehensive operation data analysis
In addition to short-term response policies for a single event, operations require long-term data record and analysis. Our daily operations are presented in the form of reports to track and analyze the domain name renewal volume, number of users, attack conditions, and other data for a long time. If you need to analyze attack trends and increase Attack Protection Investment, contact the sales staff based on the user's transfer/transfer status. Here we use Graphite for plotting. D3.js also performs well in drawing reports.
In general, DNS services have their own complexity and particularity. DNSPod has long been focused on DNS resolution services, and has rich experience and profound accumulation in this field, we hope that the above sharing will bring benefits to everyone who cares about the DNS field and jointly create a better Internet environment.
[Author] Han Jian, now working in DNSPod, Inc ., as a senior engineer in the platform O & M department, he is currently engaged in the company's core business O & M and operation system development. He is mainly engaged in Linux/Database Cluster O & M and Python development.