Python Crawling sexy beauty pictures

Last Update:2018-07-24 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Requirements : Recently interested in Python crawler, so also according to gourd painting scoop to try to crawl with the reptile before the favorite site of beautiful pictures, site: Http://www.mm131.com/xinggan, where each set of pictures are a page, Save a set of drawings if it is manual to point to dozens of pages, but now with the crawler, it is very convenient, just enter the ID of the set map, you can easily save the beauty to the hard drive.

The great God said: Talk is cheap show me the code!

Next, the general web crawler's process

1. View the source code of the target Site page and find what you need to crawl
2. Get crawl content with regular or other tools such as XPATH/BS4
3. Write complete Python code to achieve crawl process 1. Target URL

Url:http://www.mm131.com/xinggan/2373.html

it's beautiful. 2. Analysis of source code F12 can find the following 2 lines of content

Src= "Http://img1.mm131.com/pic/2373/1.jpg"
span class= "Page-ch" > Total 56 pages

we have the following informationThe first page of the URL is http://www.mm131.com/xinggan/2373.html the first row is the URL of the first page of the picture, where 2373 is the ID of the set diagram the second row see this set of 56 pictures We click on the second and third pages to continue to look at the sourceThe URL for the second and third pages is the http://www.mm131.com/xinggan/2373_2.html 2373_3.html picture URL is similar to the first page, and the 1.jpg becomes 2.jpg 3. Crawl Pictures we try to crawl the first page of the graph, directly on the code:

Import requests
import re
url = ' http://www.mm131.com/xinggan/2373.html '
html = requests.get (URL). text           #读取整个页面为文本
A = Re.search (R ' img alt=.* src= "(. *?)"/", Html,re. S)  #匹配图片url
Print (A.group (1)) </code>
get:
http://img1.mm131.com/pic/2373/1.jpg

Next we need to save the pictures locally:

Pic= Requests.get (A, timeout=2)  #time设置超时 to prevent the program from suffering such
fp = open (pic, ' WB ')    #以二进制写入模式新建一个文件
Fp.write ( pic.content)  #把图片写入文件
fp.close ()

In this way, your local will have the first beautiful map, since the first one has been saved, that the rest also do not let go, continue to put code: 4. Continue to load the code into the required modules, and set up a picture storage directory

#coding: utf-8
Import requests
Import re
import OS from
BS4 import beautifulsoup
pic_id = raw_input (' Input pic ID: ')
Os.chdir ("G:\pic")
Homedir = OS.GETCWD ()
print ("Current directory%s"% homedir)
fulldir = Unicode (Os.path.join ( homedir,pic_id), encoding= ' Utf-8 ')  #图片保存在指定目录, and set the directory
if not Os.path.isdir (Fulldir) According to the Set diagram ID:
    Os.makedirs (Fulldir)

because we need to keep paging to get pictures, so we get the total number of pages

Url= ' http://www.mm131.com/xinggan/%s.html '% pic_id
html = requests.get (URL). text
#soup = BeautifulSoup ( HTML)
soup = beautifulsoup (html, ' Html.parser ')  #使用soup取关键字, the previous line will be an error Userwarning:no parser was explicitly specified
ye = soup.span.string
ye_count = Re.search (' \d+ ', ye)
print (' Pages: total%d pages '% int (ye_ Count.group ()))

Main function

def downpic (pic_id):
    n = 1
    url= ' http://www.mm131.com/xinggan/%s.html '% pic_id while
    n <= Int (ye_ Count.group ()):  #翻完停止
        #下载图片
        try:
            if not n = = 1:
                url= ' http://www.mm131.com/xinggan/%s_%s.html '% (pic_id,n) #url随着n的值变化的
            html = requests.get (URL). text
            Pic_url = re.search (R ' img alt=.* src= "(. *?)"/", Html,re. S)   #使用正则去关键字
            pic_s = pic_url.group (1)
            print (pic_s)
            pic= requests.get (pic_s, timeout=2)
            Pic_cun = fulldir + ' \ \ + str (n) + '. jpg '
            fp = open (Pic_cun, ' WB ')
            fp.write (pic.content)
            fp.close ()
            n + + 1
        except Requests.exceptions.ConnectionError:
            print ("" Error "current picture cannot be downloaded")
            continue
if __name__ = = ' __main__ ':
    downpic (pic_id)

Program Running Up

5. All right, wrap it up.

Looking at the image on the hard drive is not cool, of course, the crawler can not only just under the picture, it could do other things, such as crawling 12306 of train information, or job network job information, or other, in short hurriedly put this skill get up, enrich it.

Reference : HTTP://WWW.JIANSHU.COM/P/19C846DACCB3

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Python Crawling sexy beauty pictures

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Python Crawling sexy beauty pictures

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support