The main use of two libraries, Urllib and BeautifulSoup.
The function is to parse out the query words and specific explanations of the dream from the HTML.
1 #-*-coding:utf-8-*-2 ImportUrllib, Urllib23 ImportTime , Random4 fromBeautifulSoupImportBeautifulSoup5 6 defFetchurl (str_url):7 8User_agent ='mozilla/5.0 (Windows NT 6.1; WOW64)9 applewebkit/537.36 (khtml, like Gecko)'TenValues = {} Oneheaders = {'user-agent': User_agent} Adata =Urllib.urlencode (values) - -Content ="' the - Try: -Request =Urllib2. Request (Str_url) -Response =Urllib2.urlopen (Request) +html = Response.read (). Decode ('gb2312') -Content =parse_content_page (HTML) + except: AContent =None at - returncontent - - defparse_content_page (HTML): -parsed_html =BeautifulSoup (HTML) - Try: intitle = Parsed_html.body.find ('H1', attrs={'class':'Art_title'}). Text -Content = Parsed_html.body.find ('Div', attrs={'class':'Dream_detail'}). Text to except: + returnNone - the return[title, content] * $ Panax Notoginseng - if __name__=='__main__': the +Foutput ='Jiemeng.txt' AWith open (Foutput,'W') as Fout: the forIinchXrange (1, 10): +Reques_url ='http://tools.2345.com/zhgjm/%s.htm'%Str (i) -x =Fetchurl (Reques_url) $ ifX! =None: $ Print>>fout, X[0].encode ('UTF8') [3:-3] - Print>>fout, X[1].encode ('UTF8') - the #sleep for a while between-HTTP requests -seconds = Random.random () *10 + 2WuyiTime.sleep (seconds)
Examples of Python-made reptiles