Description: The program uses Http://s.tool.chinaz.com/same results from this web site query, using Python to implement crawl results simply
Search for a result, grab packet analysis,
Use Python to mimic a post form, using regular expressions to match results
The code is as follows:
#-*-coding:utf-8-*-import urllibimport urllib2import reimport sys#get URL in the same ipdef get_url (URL): #set hea Der Info headers = {' user-agent ': ' mozilla/5.0 (Windows NT 6.1; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/34.0.1847.116 safari/537.36 ', ' Referer ': ' Http://s.tool . Chinaz.com/same '} postdata = Urllib.urlencode ({' s ': url}) req = Urllib2. Request (' Http://s.tool.chinaz.com/same ', postdata,headers) Try:result = Urllib2.urlopen (req) except:p Rint ' Failed to open url,you can try again ... ' return fweb = Result.read () #.</span> <a href= ' http:/ /www.31hzp.com ' pattern = Re.compile (R ' </span> <a href=\ ' (. +?) \ ') match = Pattern.findall (fweb) filename = str (URL). Replace (': ', '). replace (' \ \ ', ') fp = open (filename+ '. Tx T ', ' W ') if match:for m in Match:fp.write (m) fp.write (' \ n ') print m else: print ' FInd nothing ... ' fp.close () #usagedef usage (name): #www. 31jmw.com print '%s www.xxx.com '%name sys.exit (1) #entry Pointif __name__ = = ' __main__ ': If Len (sys.argv)! = 2:usage (sys.argv[0]) print ' start ... ' url = "". Join (S YS.ARGV[1]) #取出列表中的字符串 #print url get_url (URL) print ' End ... '
The test results are as follows:
f:\mycode\python\pytest\src>ipsamescan.py www.31jmw.comstart...http://www.31hzp.comhttp://100ec.cnhttp:// ec100.cnhttp://toocle.cnhttp://www.31jmw.comhttp://www.31expo.comhttp://www.toocle.cnhttp://561288.comhttp:// Www.toocle.com.cnhttp://www.31metals.comhttp://31expo.comhttp://www.100ec.cnend ...
Python implementation of the same service website address acquisition