From the blog: Shanghai-leisurely
Original address: http://www.cnblogs.com/yoyoketang/tag/beautifulsoup4/
Objective
This article teaches you how to crawl the pictures on the site and save to the local computer
First, the target site
1. Open a landscape map of the website: http://699pic.com/sousuo-218808-13-1.html
2. Using Firebug positioning, open the Firepath CSS to locate the target image
3. As you can see, all the images are IMG tags, the class attribute is lazy
Second, use Find_all to find all the labels
1.find_all (class_= "lazy") gets all the picture object tags
2. URL address and title of JPG from the tag
1 # coding:utf-8 2 from BS4 import beautifulsoup 3 Import requests 4 import OS 5 R = Requests.get ("Http://699pic.com/sousu O-218808-13-1.html ") 6 fengjing = r.content 7 soup = beautifulsoup (fengjing," Html.parser ") 8 # Find All tags 9 images = soup.fi Nd_all (class_= "lazy") # Print Images # Returns a list object one-by-one in images:13 jpg_rl = i["Data-original"] # get URL address 14< C2/>title = i["title"] # return title Name page print title16 print jpg_rl17 print ""
Third, Save the picture
1. Create a JPG subfolder under the current script folder
2. Import the OS module, OS.GETCWD () This method can get the path of the current script
3. Open the file path written to the local computer with open, named: OS.GETCWD () + "\\jpg\\" +title+ '. jpg ' (name duplicates, will be overwritten)
4.requests get open the URL address of the picture, the content method returns a binary stream file that can be written directly to the local
Iv. Reference Code
1 # coding:utf-8 2 from BS4 import beautifulsoup 3 Import requests 4 import OS 5 R = Requests.get ("Http://699pic.com/sousu O-218808-13-1.html ") 6 fengjing = r.content 7 soup = beautifulsoup (fengjing," Html.parser ") 8 # Find All tags 9 images = soup.fi Nd_all (class_= "lazy") # Print Images # Returns a list object one-for-I in images:13 jpg_rl = i["data-original"]14 title = i[ "title"]15 print title16 print jpg_rl17 print "" with Open (OS.GETCWD () + "\\jpg\\" +title+ '. jpg ', " WB ") as F:19 f.write (Requests.get (JPG_RL). Content)
Python crawler BEAUTIFULSOUP4 series 3 "Reprint"