Python Crawling sexy beauty pictures

Source: Internet
Author: User

Requirements : Recently interested in Python crawler, so also according to gourd painting scoop to try to crawl with the reptile before the favorite site of beautiful pictures, site: Http://www.mm131.com/xinggan, where each set of pictures are a page, Save a set of drawings if it is manual to point to dozens of pages, but now with the crawler, it is very convenient, just enter the ID of the set map, you can easily save the beauty to the hard drive.

The great God said: Talk is cheap show me the code!

Next, the general web crawler's process

1. View the source code of the target Site page and find what you need to crawl
2. Get crawl content with regular or other tools such as XPATH/BS4
3. Write complete Python code to achieve crawl process 1. Target URL

Url:http://www.mm131.com/xinggan/2373.html

it's beautiful. 2. Analysis of source code F12 can find the following 2 lines of content

Src= "Http://img1.mm131.com/pic/2373/1.jpg"
span class= "Page-ch" > Total 56 pages
we have the following informationThe first page of the URL is http://www.mm131.com/xinggan/2373.html the first row is the URL of the first page of the picture, where 2373 is the ID of the set diagram the second row see this set of 56 pictures We click on the second and third pages to continue to look at the sourceThe URL for the second and third pages is the http://www.mm131.com/xinggan/2373_2.html 2373_3.html picture URL is similar to the first page, and the 1.jpg becomes 2.jpg 3. Crawl Pictures we try to crawl the first page of the graph, directly on the code:
Import requests
import re
url = ' http://www.mm131.com/xinggan/2373.html '
html = requests.get (URL). text           #读取整个页面为文本
A = Re.search (R ' img alt=.* src= "(. *?)"/", Html,re. S)  #匹配图片url
Print (A.group (1)) </code>
get:
http://img1.mm131.com/pic/2373/1.jpg
Next we need to save the pictures locally:
Pic= Requests.get (A, timeout=2)  #time设置超时 to prevent the program from suffering such
fp = open (pic, ' WB ')    #以二进制写入模式新建一个文件
Fp.write ( pic.content)  #把图片写入文件
fp.close ()

In this way, your local will have the first beautiful map, since the first one has been saved, that the rest also do not let go, continue to put code: 4. Continue to load the code into the required modules, and set up a picture storage directory

#coding: utf-8
Import requests
Import re
import OS from
BS4 import beautifulsoup
pic_id = raw_input (' Input pic ID: ')
Os.chdir ("G:\pic")
Homedir = OS.GETCWD ()
print ("Current directory%s"% homedir)
fulldir = Unicode (Os.path.join ( homedir,pic_id), encoding= ' Utf-8 ')  #图片保存在指定目录, and set the directory
if not Os.path.isdir (Fulldir) According to the Set diagram ID:
    Os.makedirs (Fulldir)
because we need to keep paging to get pictures, so we get the total number of pages
Url= ' http://www.mm131.com/xinggan/%s.html '% pic_id
html = requests.get (URL). text
#soup = BeautifulSoup ( HTML)
soup = beautifulsoup (html, ' Html.parser ')  #使用soup取关键字, the previous line will be an error Userwarning:no parser was explicitly specified
ye = soup.span.string
ye_count = Re.search (' \d+ ', ye)
print (' Pages: total%d pages '% int (ye_ Count.group ()))
Main function
def downpic (pic_id):
    n = 1
    url= ' http://www.mm131.com/xinggan/%s.html '% pic_id while
    n <= Int (ye_ Count.group ()):  #翻完停止
        #下载图片
        try:
            if not n = = 1:
                url= ' http://www.mm131.com/xinggan/%s_%s.html '% (pic_id,n) #url随着n的值变化的
            html = requests.get (URL). text
            Pic_url = re.search (R ' img alt=.* src= "(. *?)"/", Html,re. S)   #使用正则去关键字
            pic_s = pic_url.group (1)
            print (pic_s)
            pic= requests.get (pic_s, timeout=2)
            Pic_cun = fulldir + ' \ \ + str (n) + '. jpg '
            fp = open (Pic_cun, ' WB ')
            fp.write (pic.content)
            fp.close ()
            n + + 1
        except Requests.exceptions.ConnectionError:
            print ("" Error "current picture cannot be downloaded")
            continue
if __name__ = = ' __main__ ':
    downpic (pic_id)
Program Running Up

5. All right, wrap it up.

Looking at the image on the hard drive is not cool, of course, the crawler can not only just under the picture, it could do other things, such as crawling 12306 of train information, or job network job information, or other, in short hurriedly put this skill get up, enrich it.

Reference : HTTP://WWW.JIANSHU.COM/P/19C846DACCB3

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.