About crawling tools

Source: Internet
Author: User

From5up3rh3iblog

Many people on xkungfoo are very interested in crawling tools, mainly the current black box testing tool. I still know that all the Webscan tools of chuangyu need spider crawling. however, the current crawling tool does not support URLs in ajax xml + json flash. here are some of my ideas:

1. syntax analysis for ajax, xml, and json can still be done, but js writing is complex and easy to deform, so it is difficult to analyze the complexity,
In this case, because ajax is generally implemented by js itself, we can use sniff to obtain the url submitted by ajax.

2. for flash, if wget is decompiled and then analyzed as syntax to extract the url, this can also be achieved. but don't forget the efficiency :), but you can also grade your crawling tool to run those common ones. In the case of flash, only wget is used for analysis, if you need advanced crawling, analyze and process those flash files. this is just a traditional idea. we can create a proxy or browser plug-in on our own, or sniff. then, the user normally accesses flash for interactive operations. Then, the url obtained by your proxy, browser plug-in, or sniff silently processing in the background can be referred to a unified detection platform, or your proxy, browser plug-in, or sniff directly blackbox these URLs and then the results are mentioned to a platform.

[Attackers only need to create such a thing on their own pc. If they are conducting product tests, they can create a plug-in on the employee's work machine, they tested the security at the same time as they tested the function :)]

What about webmasters to prevent such crawling tools? In an issue of xkungfoo [I do not remember], I mentioned a method that is to give you a js and determine whether your client has executed this js, because crawling generally does not contain a js parser, in fact this method has been used for a long time, such as html ">Http://www.sirdarckcat.net/youhavenoscript.htmlThis idea is also used. Here I will give a simple poc:

<Html>
<Body>
<? Php
If (! @ $ _ GET [y] = 1){
Print <script> document. location = "? Y = 1 "; </script>;
Die (your crawling tool? Or use noscript );
} Else {
Print You are legal;
}
?>
</Body>
</Html>

Here we use document. location, which is obvious. If ajax or other methods are used, they will be hidden :)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.