has been reproduced by some of the network security related articles, recently had the time to write a project completed before the core technology, to network security or vulnerability scanner interested can join me to explore this knowledge.
PS: When I finished designing this scanner, I found that I have become a hacker who will write code, no, it should be a white hat, because I will not do bad things, hey ....
The essential basis for designing a website's vulnerability scanner is that you must be very familiar with the HTTP protocol and related libraries (such as Urllib, URLLIB2, as the system uses pure Python development, the network is mainly used in these two libraries. ), it is worth noting that this system I draw on a number of open source projects and commercial project design, and do a lot of optimization, and now the overall performance in similar products are also good, I will be here to outline the core technology involved in the core ideas, but also some deficiencies in the design of the place to explain, Because the final knot is low I prefer design rather than vulnerability mining or hacking, so the introduction of the vulnerability will be a simple explanation, if you are interested I will write another article to specifically describe the various types of vulnerability detection methods and utilization methods. Due to the long development cycle of the scanner, all the design and coding are done by me, some of which I may not consider perfect, I hope to have this interest in the discussion with people.
Website scanning is the use of web scanning technology to conduct a complete security assessment of the site, design mainly includes scanning engine, scanning strategy, engine server, UI and other modules, the main modules are described as follows:
1. The scanning engine adopts the dual process model, which is the crawler process and the scanning process, the crawler process is responsible for the user-configured parameters to crawl the link or request on the specified Web site, and then submit to the scanning process, the scanning process is based on the configured policy and fetch requests to scan, to find out the security risk of the request.
2. Scanning strategy: A scan policy corresponds to a script, a scanning policy may detect a number of types of vulnerabilities, may correspond to a scan rule base, the current scanning strategy has reached more than 50, has covered the mainstream web vulnerabilities, server vulnerabilities, CMS vulnerabilities, Web page hanging horse, such as common SQL injection, XSS, Strut2 Remote Code execution, information disclosure, and more.
3. The scan engine is implemented with a cross-platform boost ASIO, which is dispatched by the engine server when the scan Scan task executes simultaneously.
The core technology of web crawler:
As the engine of the scanning engine, the crawler is responsible for crawling the URLs on the page by policy, including requests such as Get and post. Core points include filtering of domain names and sub-domains, judgment on custom 404 pages, design of crawl algorithms, memory and disk processing of data storage, processing of various network anomalies and timeouts, processing of page jumps, parsing and linking of various types of pages, deduplication of pages and parameters, The processing of pages that require login in the background, other additional considerations include HTTPS certificate processing, HTTP authentication such as BASIC, NTLM, use of Gzip in the HTTP protocol, processing of keeplive long connections, DNS caching, Multipartpost uploads, HTTP proxy, including local cache request optimization, and so on. Of course, consider more of the words should also include the deep parsing of JavaScript, for some of the dynamically generated by JS or Ajax page may adopt this technology.
The scanning process, one of the core of the scanning process, is primarily responsible for scheduling execution of all policies, which may vary depending on the policy, some policies may only need to be executed one or more times, some policies may require each request or a special request to be executed once, and there may be dependencies between policies. When a policy discovers a web vulnerability, the details of all the vulnerabilities are submitted to the scanning process through the interface, and the scanning process is responsible for communicating with the engine server, which defines various types of packets, such as scan status, scan progress, vulnerability information, and so on. The scanning strategy adopts the plug-in model design, and can realize the scanning function of the corresponding vulnerability by customizing the plug-in. The scanning strategy is broadly enumerated here: cross-site scripting attacks, SQL injection, SQL blinds, remote command execution, WebDAV insecure configuration, WebDAV remote code execution, arbitrary file uploads, directory traversal, Struts2 remote code execution, form bypass, file containment, OpenSSL Heart Bleed, Spring Remote code execution, spring expression JSP properties Handle information disclosure, IIS short file and folder leak vulnerabilities, available ports, information leaks, CMS fingerprint recognition, and more.
The engine server encapsulates the various interfaces of the user layer, including engine startup, various types of callback interfaces (such as scan status callbacks, weakness callbacks, command line callbacks, error callbacks, and so on), and I encapsulate all the processing of a server in a DLL, which was designed to be modular and easy to scale. Can be designed as a separate scanning service and later use of BS structure, UI part of the implementation of Duilib, do not repeat.
All the strategies through the plug-in implementation for the write data acquisition software can be achieved on the basis of this scanner, a good vulnerability scanner should have a complete log troubleshooting function, the network is very complex, you may not anticipate the error, you should be able to quickly find the cause of the error and improve the optimization of the log , the false positives and false negatives of the loopholes are controlled within a certain range. The only disadvantage is that the dynamic JS parsing ability is weak, in particular, can refer to the http://demo.aisec.cn/demo/aisec/ of the various dynamic page generation method, I use the virtual browser to access a URL significantly higher than the URLLIB2 request time, directly slowing down the engine crawl speed. Although there is no problem with request fetching in JS-generated pages, AJAX requests are still more difficult to catch. The next version of the scanner will consider the addition of web2.0 's efficient crawl function, the following is attached to a site scan results.
Figure 1
Figure 2
Figure 3
Web site Vulnerability Scanner Core technology research One