Today, it took me a night to learn to do, to make out my first reptile. Learning python for two months, intermittent, but always give up, today engaged in a small project, a kind of harvest joy. Don't talk nonsense, just attach all my code.
1 #-*-coding:utf-8-*-2 __author__=' Young'3 4 ImportRe,urllib#urllib: Web Access, return Web page data, content5 defMy_get (ID):#encapsulation into functions for easy invocation6html = Urllib.urlopen ("https://read.douban.com/ebooks/tag/%E5%B0%8F%E8%AF%B4/?cat=book&sort=top&start="+str (ID))#Urllib.urlopen () Open Watercress reading page str (ID)--Convenient page switching7html = Html.read ()#parsing the returned content8Reg = R'<span class= "Price-tag" > (. *?) Meta </span><a href= ". *?" target= "_blank" class= "btn Btn-icon" > Probation </a></div><a data-target-dialog= "Login" href= "#" class= "Require-login btn btn-info btn-cart" ><i class= "Icon-cart" ></i ><span class= "Btn-text" > Added shopping cart </span></a></div><div class= "title" ><a href= ". *?" Onclick= "Moreurl\ (this, {& #39;aid& #39;: & #39;. *?& #39;, & #39;src& #39;: & #39;tag& #39;}, true, \ ' read.douban.com\ ' \) "> (. *?) </a>'9Reg =Re.compile (REG)Tenrel = Re.findall (reg,html)#rel is a two-dimensional list One returnrel A -ID =0 -Price =0 thefn = open (r'G:\13_Python-Files\douban.txt',"a")#the storage address a for the file that holds the data indicates that you can append the write to the file - whileID<=80:#root page URL analysis to get the rule, here crawl the first 4 pages of content -My_list = My_get (ID)#My_list-Store return results - forIinchmy_list: +Fn.write ("Title:%s-----------Price:%s\n"% (i[1],i[0])) -Price + = float (i[0])#Price is floating point type +ID + = 1#Book Technology A PrintI[0],i[1] at PrintID -Fn.write ("Quantity:%s\t Total Price:%s\t average price:%s\n"% (Id,price,"%.2f"% (price/ID ))) -Fn.close ()#Finally, don't forget to close the file
The results are as follows:
Flaws: Some of the data is missing, continue to find out why
My first crawler of the Python project----Crawl The Watercress Book Network and count the number of books