#Coding:utf-8#author @in2#after crawling, adjust the page encoding to Utf-8:)ImportURLLIB2,BS4 fromBs4ImportBeautifulSoup#Import related Modulesh= Open ('cve.html','W')#Open the cve.html file, do not exist automatically create a new forPagesinchRange (1,30246):#Number of pages 1 to 30246strpage=str (pages)Print "Current is the first"+ Strpage +"a vulnerability"URL="http://www.nsfocus.net/vulndb/"+strpage#Stitching URLR = Urllib2. Request (URL)#instantiating the Request objectpage= Urllib2.urlopen (R)#Opens= BeautifulSoup (page)#parsingtext= S.findall (Attrs = {'Align': ['Center'] })#Find label Conditions align= ' center ' foreachinchText#Traverse ifEach.name = ='Div': PrintSTR (EACH.B). Decode ('Utf-8') ifEACH.B:#not empty, writeH.write ('<div align= "center" ><b> vulnerability name:</b></div>'+STR (each) +'<div align= "center" ><b>url:'+url+"</b></div>"+"") elifeach = ="No vulnerability record": H.write ("No vulnerability record")Else: Print("the code pumped qaq.") Passh.close ()#close files, release resources
Crawling of reptiles to get the title and address of the Green League Vulnerability Report