When we surf the web on a daily basis, we often see some good pictures of a certain site, they may exist in many pages , we would like to save these images to download, or users to do desktop wallpaper, or to make design material.
Our most common practice is to choose Save as by right mouse button. But some pictures of the right mouse button is not saved as an option, there are ways to pass through the tool is intercepted, but this reduces the sharpness of the picture. Even if you can get it down, but we need thousands of pages in the picture, if one download, your hand will be disabled. All right ~! In fact, you are very powerful, right-click to View the page source code .
We can use Python to implement such a simple crawler function, to crawl the code we want locally. Here's a look at how to use Python to implement such a feature.
First, get the entire page data
def get_content (URL): """ Get the Web source "" " = urllib.urlopen (url )= Html.read () html.close () return content
Second, grab the picture file name
When fetching file names, the "*", "/" and other symbols are deleted because special symbols affect the display.
defget_name (name,file):"""Grab picture file name"""Self.picname= Name.decode ('Utf-8') if "*" inchSelf.picName:self.picName= Self.picName.replace ("*","") elif "/" inchSelf.picName:self.picName= Self.picName.replace ("/","") PrintSelf.picnamedefGet_file (info):"""get img File"""Soup= BeautifulSoup (Info,"Html.parser") #Find all modules for free downloadAll_files = Soup.find_all ('a', title="free Download") #Find all the Hi titlesTitles = Soup.find_all ('H1') #capture the desired title forTitleinchTitles:name= str (title) [4:-5] #Get file name forFileinchall_files:get_name (name,file)
Third , download pictures
The download suffix is a "gif" or "jpg " image and is stored in the E:\\googledownload\\\cssmuban directory
defpic_category (str_images):"""Download Image"""Soup= BeautifulSoup (Info,"Html.parser") All_image= Soup.find_all ('Div', class_="Large-imgs") Images=str_images Pat=re.compile (Images) Image_code=Re.findall (Pat,str (all_image)) forIinchImage_code:ifSTR (i) [-3:] = ='gif': Image= Urllib.urlretrieve ('http://www.cssmoban.com'+str (i),'e:\\googledownload\\\cssmuban\\'+str (Self.picname). Decode ('Utf-8')+'. gif') Else: Image= Urllib.urlretrieve ('http://www.cssmoban.com'+str (i),'e:\\googledownload\\\cssmuban\\'+str (Self.picname). Decode ('Utf-8')+'. jpg') defPic_download (info):"""Download Image"""Pic_category (R'src= "(. +?\.gif)"') Pic_category (R'src= "(. +?\.jpg)"')
Four , traverse all URLs, download the desired picture and file name of each page
Self.num = 1# download file for in range (6000): '/http/ www.cssmoban.com/cssthemes/'+ str (self.num) +'. shtml' = get_content (URL) get_file (info) pic_download (info) = self.num + 1
The results of the operation are as follows:
This site article is for baby bus SD. Team Original, reproduced must be clearly noted: (the author's official website: Baby bus )
Reprinted from "Baby bus Superdo Team" original link: http://www.cnblogs.com/superdo/p/4927574.html
[Python crawler] simple crawler function