[Python crawler] simple crawler function

Source: Internet
Author: User

When we surf the web on a daily basis, we often see some good pictures of a certain site, they may exist in many pages , we would like to save these images to download, or users to do desktop wallpaper, or to make design material.

Our most common practice is to choose Save as by right mouse button. But some pictures of the right mouse button is not saved as an option, there are ways to pass through the tool is intercepted, but this reduces the sharpness of the picture. Even if you can get it down, but we need thousands of pages in the picture, if one download, your hand will be disabled. All right ~! In fact, you are very powerful, right-click to View the page source code .

We can use Python to implement such a simple crawler function, to crawl the code we want locally. Here's a look at how to use Python to implement such a feature.

First, get the entire page data

def get_content (URL):     """     Get the Web    source "" "    = urllib.urlopen (url    )= Html.read ()    html.close ()     return content

Second, grab the picture file name

When fetching file names, the "*", "/" and other symbols are deleted because special symbols affect the display.

defget_name (name,file):"""Grab picture file name"""Self.picname= Name.decode ('Utf-8')    if "*" inchSelf.picName:self.picName= Self.picName.replace ("*","")    elif "/" inchSelf.picName:self.picName= Self.picName.replace ("/","")    PrintSelf.picnamedefGet_file (info):"""get img File"""Soup= BeautifulSoup (Info,"Html.parser")    #Find all modules for free downloadAll_files = Soup.find_all ('a', title="free Download")    #Find all the Hi titlesTitles = Soup.find_all ('H1')    #capture the desired title     forTitleinchTitles:name= str (title) [4:-5]        #Get file name     forFileinchall_files:get_name (name,file)

Third , download pictures

The download suffix is a "gif" or "jpg " image and is stored in the E:\\googledownload\\\cssmuban directory

defpic_category (str_images):"""Download Image"""Soup= BeautifulSoup (Info,"Html.parser") All_image= Soup.find_all ('Div', class_="Large-imgs") Images=str_images Pat=re.compile (Images) Image_code=Re.findall (Pat,str (all_image)) forIinchImage_code:ifSTR (i) [-3:] = ='gif': Image= Urllib.urlretrieve ('http://www.cssmoban.com'+str (i),'e:\\googledownload\\\cssmuban\\'+str (Self.picname). Decode ('Utf-8')+'. gif')        Else: Image= Urllib.urlretrieve ('http://www.cssmoban.com'+str (i),'e:\\googledownload\\\cssmuban\\'+str (Self.picname). Decode ('Utf-8')+'. jpg')        defPic_download (info):"""Download Image"""Pic_category (R'src= "(. +?\.gif)"') Pic_category (R'src= "(. +?\.jpg)"')

Four , traverse all URLs, download the desired picture and file name of each page

Self.num = 1#  download file  for in range (6000):    '/http/ www.cssmoban.com/cssthemes/'+ str (self.num) +'. shtml'     = get_content (URL)    get_file (info)    pic_download (info)    = self.num + 1

The results of the operation are as follows:

This site article is for baby bus SD. Team Original, reproduced must be clearly noted: (the author's official website: Baby bus )
Reprinted from "Baby bus Superdo Team" original link: http://www.cnblogs.com/superdo/p/4927574.html

[Python crawler] simple crawler function

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.