[Python crawler] simple crawler function

Last Update:2015-11-01 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

When we surf the web on a daily basis, we often see some good pictures of a certain site, they may exist in many pages , we would like to save these images to download, or users to do desktop wallpaper, or to make design material.

Our most common practice is to choose Save as by right mouse button. But some pictures of the right mouse button is not saved as an option, there are ways to pass through the tool is intercepted, but this reduces the sharpness of the picture. Even if you can get it down, but we need thousands of pages in the picture, if one download, your hand will be disabled. All right ~! In fact, you are very powerful, right-click to View the page source code .

We can use Python to implement such a simple crawler function, to crawl the code we want locally. Here's a look at how to use Python to implement such a feature.

First, get the entire page data

def get_content (URL):     """     Get the Web    source "" "    = urllib.urlopen (url    )= Html.read ()    html.close ()     return content

Second, grab the picture file name

When fetching file names, the "*", "/" and other symbols are deleted because special symbols affect the display.

defget_name (name,file):"""Grab picture file name"""Self.picname= Name.decode ('Utf-8')    if "*" inchSelf.picName:self.picName= Self.picName.replace ("*","")    elif "/" inchSelf.picName:self.picName= Self.picName.replace ("/","")    PrintSelf.picnamedefGet_file (info):"""get img File"""Soup= BeautifulSoup (Info,"Html.parser")    #Find all modules for free downloadAll_files = Soup.find_all ('a', title="free Download")    #Find all the Hi titlesTitles = Soup.find_all ('H1')    #capture the desired title     forTitleinchTitles:name= str (title) [4:-5]        #Get file name     forFileinchall_files:get_name (name,file)

Third , download pictures

The download suffix is a "gif" or "jpg " image and is stored in the E:\\googledownload\\\cssmuban directory

defpic_category (str_images):"""Download Image"""Soup= BeautifulSoup (Info,"Html.parser") All_image= Soup.find_all ('Div', class_="Large-imgs") Images=str_images Pat=re.compile (Images) Image_code=Re.findall (Pat,str (all_image)) forIinchImage_code:ifSTR (i) [-3:] = ='gif': Image= Urllib.urlretrieve ('http://www.cssmoban.com'+str (i),'e:\\googledownload\\\cssmuban\\'+str (Self.picname). Decode ('Utf-8')+'. gif')        Else: Image= Urllib.urlretrieve ('http://www.cssmoban.com'+str (i),'e:\\googledownload\\\cssmuban\\'+str (Self.picname). Decode ('Utf-8')+'. jpg')        defPic_download (info):"""Download Image"""Pic_category (R'src= "(. +?\.gif)"') Pic_category (R'src= "(. +?\.jpg)"')

Four , traverse all URLs, download the desired picture and file name of each page

Self.num = 1#  download file  for in range (6000):    '/http/ www.cssmoban.com/cssthemes/'+ str (self.num) +'. shtml'     = get_content (URL)    get_file (info)    pic_download (info)    = self.num + 1

The results of the operation are as follows:

This site article is for baby bus SD. Team Original, reproduced must be clearly noted: (the author's official website: Baby bus )
Reprinted from "Baby bus Superdo Team" original link: http://www.cnblogs.com/superdo/p/4927574.html

[Python crawler] simple crawler function

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

[Python crawler] simple crawler function

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support