This recognition program is a project of my professional training this semester, is to do something similar to Zoomeye, and then use ES to integrate, so as to make search engine appearance. Then first of all have the ability to identify the Web Components online, such as the user input keywords: Discuz X3.0, I will show the corresponding version of the content is OK. As an identification subroutine, let me share the idea of identifying Web components for the moment.
I am from a brief talk on the Web fingerprint recognition technology found in the idea. For Discuz's website, the first time is to identify the footer, but the problem is that some of the good sites will often be "Powered by" the words modified, so in order to cooperate with the footer words to identify, I used the robots.txt and the more subtle meta tags to identify them together. The rough version information is obtained from the robots.txt.
Fingerprints are all put together for management, allowing for future fingerprint additions:
discuz_feature.py:
There is only one dictionary in this file to hold the corresponding fingerprint information, I can not do very fine (time is not allowed AH), so only footer information, robots information, meta-information three types of fingerprints.
In the main program directly load this fingerprint library, the following is the identification of the main program code, the program entered as a carriage return line split the domain name list, output as a result file, the code is as follows:
#coding =utf-8import requestsfrom bs4 import beautifulsoupimport refrom discuz_feature import matches ' ' Discuz Fingerprint recognition 1.meta data element recognition 2.intext recognition 3.robots.txt recognition "' Class Discuzdetector (): ' Construction method ' Def __init__ (self,url): if Url.startswith ("/http"): Self.url = Urlelse:self.url = "http://%s"% URLTRY:SELF.R = Requests.get (self.url,timeout=8) Self.page_content = Self.r.contentexcept Exception, e:print eself.r = noneself.page_content = None ' identify meta tag ' def Meta _detect (self): if not self.r:return Falsepattern = Re.compile (R ' <meta name= '. *? "content=" (. +) "/>") Infos = pattern . FindAll (self.page_content) conditions = matches[' meta '][0] or matches[' meta '][1]if infos:for x in Infos:if x.count ( conditions)! = 0:return Truebreakelse:return False ' discuz version recognition ' Def robots_dz_xx_detect (self): if not Self.r:return ( False,none) Robots_url = "%s%s"% (Self.url, "/robots.txt") robots_content = Requests.get (robots_url). Contentif not Robots_content:return (false,none) robots_feature_xx = matches[' robots_for_xx ']robots_feature_xx = matches[' robots_for_xx ']robots_list = Robots_content.split ("\ r \ n") pattern = Re.compile (R ' # Robots\.txt for (. +) ') Version_info = []for x in Robots_list: #如果robots. txt contains # robots.txt for discuz! X3 Line is directly judged version Version_info = Pattern.findall (x) if version_info! = [] and Robots_content.count ("version" and "discuz!"): If R Obots_content.count ("Version" and "discuz!"):p Attern = Re.compile (R ' # Version (. +) ') Version_number = Pattern.findall ( STR (robots_content)) if Version_number:version_info.append (version_number) return (true,version_info) else:# If version information is deleted then the version is_xx = (x in robots_feature_xx) is_xx = (x in robots_feature_xx) if Is_xx or is_xx: #判断为discuz # determine version if Is_xx = True:version_info = ' Discuz xx ' return (true,version_info) else:version_info = ' Discuz xx ' return (true,version_info) # Not Discuzreturn (false,none) "detects discuz in Web page" "Def Detect_intext (self): if not self.r:return falsetext_feature = matches[' Intext '][0] or matches[' intext '][1]if self.page_content.count (text_feature)! = 0:return TruEelse:return False "Discriminant method" Def get_result (self): If not Self.r:return (False, ' not discuz! ') Is_meta = Self.meta_detect () res = Self.robots_dz_xx_detect () is_dz_robots = Res[0]version_info = Res[1]print Version_ Infois_intext = Self.detect_intext () if Is_meta or Is_dz_robots or is_intext: #print ' Find discuz! ' If version_info:# return (True, ' find! version:%s '% (version_info[0])) return (true, '%s '% (Version_info[0])) Else:return (True, ' Version:unknown ') Else: Return (False, ' not discuz! ') if __name__ = = ' __main__ ': ' Read file recognition ' F = open (' Discuz.txt ', ' r ') WF = open (' Results.txt ', ' a ') file_content = F.read () dz_ Url_list = File_content.split (' \ n ') for the URL in dz_url_list:print urldetector = discuzdetector (URL) ret = detector.get_ Result () Print retif ret[0]:wf.write ("%s\t%s\n"% (Url,ret[1])) Else:continuewf.close () F.close ()
The discuz.txt inside is the domain name list file that needs to be recognized, the output is Results.txt, the program executes as follows:
It seems that the x3.x version is very much used.
In some cases, batch utilization is required, and a slight modification of the script can help identify the Discuz site in the domain name database. All you need to do is attack the exploit code as a subsequent module.
Of course, the use of web fingerprinting for bulk use, although the accuracy is high, but the cost of time, not suitable for large-scale scanning, in this case, is generally fuzzing run a dictionary to do.
The effect of using Elasticsearch integration is as follows:
If you want to do with a model, then you need to add the back of the monitoring and vulnerability attack module, using restful interface to make API is the best, most flexible choice, will gradually improve, and strive to make zoomeye embryonic:-)
In addition, reprint please indicate the source AH elder brothers!!
Discuz Recognition of web fingerprint recognition + rough version judgment