Discuz Recognition of web fingerprint recognition + rough version judgment

Last Update:2018-07-24 Source: Internet

Author: User

Tags name database

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This recognition program is one of my professional training projects this semester, is to do something similar to Zoomeye, and then use ES for integration, so as to make search engine appearance. Then first of all have the ability to go online to identify the corresponding Web components, such as user input keywords: Discuz X3.0, I will show the corresponding version of the content just OK. As a recognition subroutine, I'm here to share the idea of identifying Web components.

I am from a brief discussion of the Web Fingerprint identification technology in the article found the idea. For Discuz's website, the first time to think is to identify footer, but the problem is that some of the good do some sites will be "powered by" the words modified, so in order to match the footer words to identify, I used the robots.txt and the more covert meta tags for common identification. The rough version information is obtained from the robots.txt.

The fingerprints are all put together for management to facilitate fingerprint additions later:

discuz_feature.py:

There is only one dictionary in this file to store the corresponding fingerprint information, I can not do very fine (time does not allow AH), so only footer information, robots information, meta information three types of fingerprints.

In the main program directly load this fingerprint library, the following is the identification of the main program code, the program entered as a carriage return line split domain Name list, output as the result file, the code is as follows:

#coding =utf-8 Import requests from BS4 import BeautifulSoup import re from discuz_feature import matches ' ' Discuz fingerprint identification 1 Meta-metadata recognition 2.intext recognition 3.robots.txt ' class Discuzdetector (): ' ' Construct method ' Def __init__ (self,url): If Url.startswith (
			"http://"): Self.url = URL Else:self.url = "http://%s"% url try:self.r = requests.get (self.url,timeout=8) Self.page_content = self.r.content except Exception, e:print e SELF.R = None self.page_content = None ' ' Identify meta tags ' def meta_detect (self): If not self.r:return False pattern = re.compile (R ' <meta name= ". *?" Conten  T= "(. +)"/> ') infos = Pattern.findall (self.page_content) conditions = matches[' meta '][0] or matches[' meta '][1] if Infos:for x in Infos:if x.count (conditions)!= 0:return True break Else:return False ' disc Uz version identification ' Def robots_dz_xx_detect (self): If not Self.r:return (false,none) Robots_url = "%s%s"% (Self.url, "/rob") Ots.txt ") Robots_conteNT = Requests.get (robots_url). Content if not Robots_content:return (false,none) robots_feature_xx = matches[' robot S_for_xx '] robots_feature_xx = matches[' robots_for_xx '] robots_list = Robots_content.split ("\ r \ n") pattern = Re.comp Ile (R ' # Robots\.txt for (. +) ") Version_info = [] for x in robots_list: #如果robots. txt contains # robots.txt for discuz! The X3 line directly determines the version version_info = Pattern.findall (x) if Version_info!= [] and Robots_content.count ("version" and "Discuz") ! "): If Robots_content.count (" Version "and" discuz! "): Pattern = Re.compile (R ' # version (. +) ') Version_numbe R = Pattern.findall (str (robots_content)) if Version_number:version_info.append (version_number) return (Tru
				E,version_info) Else: #若版本信息被删除则识别出版本 is_xx = (x in robots_feature_xx) is_xx = (x in robots_feature_xx) If Is_xx or is_xx: #判断为discuz #判断版本 if is_xx = = True:version_info = ' Discuz Xx ' return (True,
		Version_info) Else:				Version_info = ' Discuz xx ' return (true,version_info) #不是discuz return (False,none) ' "to detect Discuz words in Web pages"
		def detect_intext (self): if not self.r:return False text_feature = matches[' intext '][0] or matches[' intext '][1] If Self.page_content.count (text_feature)!= 0:return True else:return False ' discriminant method ' Def get_result (self
		): If not Self.r:return (False, ' not discuz! ') Is_meta = Self.meta_detect () res = Self.robots_dz_xx_detect () is_dz_robots = res[0] Version_info = res[1] Print ve
			Rsion_info Is_intext = Self.detect_intext () if Is_meta or Is_dz_robots or is_intext: #print ' Find discuz! ' If Version_info: # return (True, ' find! 
		version:%s '% (version_info[0])) return (true, '%s '% (Version_info[0])) Else:return (True, ' Version:unknown ')
    	

Else:return (False, ' not discuz! ') if __name__ = = ' __main__ ': ' Read file recognition ' F = open (' Discuz.txt ', ' r ') WF = open (' Results.txt ', ' a ') file_content = F.read () DZ_url_list = File_content.split (' \ n ') for URL in dz_url_list:print URL detector = discuzdetector (URL) ret = Detecto
 R.get_result () print ret if ret[0]: Wf.write ("%s\t%s\n"% (url,ret[1)) else:continue wf.close () f.close ()

The discuz.txt inside is the domain name list file that needs to be identified, the output is Results.txt, the program executes as follows:

It seems the x3.x version is used quite a bit.

In some cases, the need to do a lot of use, the script can be slightly modified to help identify the domain name database discuz site. All you need to do is attack the vulnerability code as a subsequent module.

Of course, on the bulk use, the use of Web fingerprint identification This method, although the accuracy is high, but it is time-consuming, not suitable for large-scale scanning, in this case, is generally fuzzing run dictionary to do.

The effect of using elasticsearch consolidation is as follows:

If you want to do have a model, then you need to add the back of the monitoring and vulnerability attack module, the use of restful interface to make API is the best, most flexible choice, will gradually improve, and strive to make zoomeye embryonic:-)

In addition, reprint please indicate the source Ah, Big Brothers.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More