Operation Steps of the crawler:
Reptile three-Step walk
- Crawler Second step: Use BEAUTIFULSOUP4 to parse the data:
1. Import BS4
2. Parsing Web page data
3. Finding data
4.for Cycle Printing
From BS4 Import Beautifulsoupsoup = BeautifulSoup (R, ' lxml ') pattern = soup.find_all (' P ', ' comment-content ') for item in Pattern:print (item.string)
- Crawler Step Three: Use Pandas to save data:
1. Import Pandas
2. Create a new List object
3. Write using To_csv
Import pandascomments = []for item in Pattern:comments.append (item.string) df = Pandas. DataFrame (comments) df.to_csv (' Comments.csv ')
The complete reptile
Import REQUESTSR = Requests.get (' https://book.douban.com/subject/1084336/comments/'). Text from BS4 Import Beautifulsoupsoup = BeautifulSoup (R, ' lxml ') pattern = soup.find_all (' P ', ' comment-content ') for item in Pattern:print ( item.string) Import pandascomments = []for item in Pattern:comments.append (item.string) df = Pandas. DataFrame (comments) df.to_csv (' Comments.csv ')
Code Run Result:
Python Learning notes Crawler 1