Discuz Recognition of web fingerprint recognition + rough version judgment

Source: Internet
Author: User
Tags name database

This recognition program is one of my professional training projects this semester, is to do something similar to Zoomeye, and then use ES for integration, so as to make search engine appearance. Then first of all have the ability to go online to identify the corresponding Web components, such as user input keywords: Discuz X3.0, I will show the corresponding version of the content just OK. As a recognition subroutine, I'm here to share the idea of identifying Web components.

I am from a brief discussion of the Web Fingerprint identification technology in the article found the idea. For Discuz's website, the first time to think is to identify footer, but the problem is that some of the good do some sites will be "powered by" the words modified, so in order to match the footer words to identify, I used the robots.txt and the more covert meta tags for common identification. The rough version information is obtained from the robots.txt.

The fingerprints are all put together for management to facilitate fingerprint additions later:

discuz_feature.py:

There is only one dictionary in this file to store the corresponding fingerprint information, I can not do very fine (time does not allow AH), so only footer information, robots information, meta information three types of fingerprints.

In the main program directly load this fingerprint library, the following is the identification of the main program code, the program entered as a carriage return line split domain Name list, output as the result file, the code is as follows:

#coding =utf-8 Import requests from BS4 import BeautifulSoup import re from discuz_feature import matches ' ' Discuz fingerprint identification 1 Meta-metadata recognition 2.intext recognition 3.robots.txt ' class Discuzdetector (): ' ' Construct method ' Def __init__ (self,url): If Url.startswith (
			"http://"): Self.url = URL Else:self.url = "http://%s"% url try:self.r = requests.get (self.url,timeout=8) Self.page_content = self.r.content except Exception, e:print e SELF.R = None self.page_content = None ' ' Identify meta tags ' def meta_detect (self): If not self.r:return False pattern = re.compile (R ' <meta name= ". *?" Conten  T= "(. +)"/> ') infos = Pattern.findall (self.page_content) conditions = matches[' meta '][0] or matches[' meta '][1] if Infos:for x in Infos:if x.count (conditions)!= 0:return True break Else:return False ' disc Uz version identification ' Def robots_dz_xx_detect (self): If not Self.r:return (false,none) Robots_url = "%s%s"% (Self.url, "/rob") Ots.txt ") Robots_conteNT = Requests.get (robots_url). Content if not Robots_content:return (false,none) robots_feature_xx = matches[' robot S_for_xx '] robots_feature_xx = matches[' robots_for_xx '] robots_list = Robots_content.split ("\ r \ n") pattern = Re.comp Ile (R ' # Robots\.txt for (. +) ") Version_info = [] for x in robots_list: #如果robots. txt contains # robots.txt for discuz! The X3 line directly determines the version version_info = Pattern.findall (x) if Version_info!= [] and Robots_content.count ("version" and "Discuz") ! "): If Robots_content.count (" Version "and" discuz! "): Pattern = Re.compile (R ' # version (. +) ') Version_numbe R = Pattern.findall (str (robots_content)) if Version_number:version_info.append (version_number) return (Tru
				E,version_info) Else: #若版本信息被删除则识别出版本 is_xx = (x in robots_feature_xx) is_xx = (x in robots_feature_xx) If Is_xx or is_xx: #判断为discuz #判断版本 if is_xx = = True:version_info = ' Discuz Xx ' return (True,
		Version_info) Else:				Version_info = ' Discuz xx ' return (true,version_info) #不是discuz return (False,none) ' "to detect Discuz words in Web pages"
		def detect_intext (self): if not self.r:return False text_feature = matches[' intext '][0] or matches[' intext '][1] If Self.page_content.count (text_feature)!= 0:return True else:return False ' discriminant method ' Def get_result (self
		): If not Self.r:return (False, ' not discuz! ') Is_meta = Self.meta_detect () res = Self.robots_dz_xx_detect () is_dz_robots = res[0] Version_info = res[1] Print ve
			Rsion_info Is_intext = Self.detect_intext () if Is_meta or Is_dz_robots or is_intext: #print ' Find discuz! ' If Version_info: # return (True, ' find! 
		version:%s '% (version_info[0])) return (true, '%s '% (Version_info[0])) Else:return (True, ' Version:unknown ')
    	

Else:return (False, ' not discuz! ') if __name__ = = ' __main__ ': ' Read file recognition ' F = open (' Discuz.txt ', ' r ') WF = open (' Results.txt ', ' a ') file_content = F.read () DZ_url_list = File_content.split (' \ n ') for URL in dz_url_list:print URL detector = discuzdetector (URL) ret = Detecto
 R.get_result () print ret if ret[0]: Wf.write ("%s\t%s\n"% (url,ret[1)) else:continue wf.close () f.close ()

The discuz.txt inside is the domain name list file that needs to be identified, the output is Results.txt, the program executes as follows:

It seems the x3.x version is used quite a bit.

In some cases, the need to do a lot of use, the script can be slightly modified to help identify the domain name database discuz site. All you need to do is attack the vulnerability code as a subsequent module.

Of course, on the bulk use, the use of Web fingerprint identification This method, although the accuracy is high, but it is time-consuming, not suitable for large-scale scanning, in this case, is generally fuzzing run dictionary to do.

The effect of using elasticsearch consolidation is as follows:

If you want to do have a model, then you need to add the back of the monitoring and vulnerability attack module, the use of restful interface to make API is the best, most flexible choice, will gradually improve, and strive to make zoomeye embryonic:-)


In addition, reprint please indicate the source Ah, Big Brothers.
 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.