Python climb heating enjoy pictures

Source: Internet
Author: User
Tags gettext

Target page: http://www.axlcg.com/wmxz/1.html

  1. First get the URL of each atlas on the first page

    You can see the URL of the atlas is really ul class Homeboy-ul Clearfix Line-dot under the A tag in Li, so we have to approach the target layer by level.

            allsoup = BeautifulSoup(allurldigit)  # 得到解析后的html        allpage = allsoup.find(‘ul‘, attrs={‘class‘: ‘homeboy-ul clearfix line-dot‘})        allpage2 = allpage.find_all(‘a‘) #一步找到所有的a标签        for allpage2index in allpage2:            allpage3 = allpage2index[‘href‘] #拿到url            if allpage3 not in allurl: #判断一下是否已经在容器里了,不在的话才加入                allurl.append(allpage3) #存到allurl这个list容器里
  2. Get the URL for each page
    Just get a page how can you call a crawler, we want to get multiple pages.

    Can see the next page of the URL is in the UL Information-page-ul Clearfix under a Li, this time found all the Li tag is the same, then how can we find the next page of the URL?

    The text in the label on the next page is written on the next page, so we can tell if the text content in Li is the next page, or skip to the next page and crawl through all the pages of the album.

  3. Get the IMG address you really want

    Click on an atlas to go in, we can see the address of the image.

    Copy it and verify that it is correct.

    The discovery is really what we want.

    In the same way to get the URL of the image and put it in a collection, an atlas will also jump to the next page of the URL, get the image URL, because each page only one.

  4. Download images to Local

            urllib.request.urlretrieve(m, "D:/Desktop//image/" + str(count) + ".jpg")

    The first parameter is the URL of the IMG, and the second parameter is the file name of the path + picture.

  5. Results

  6. Code

    #!/usr/bin/env Python# Encoding=utf-8# python crawl http://www.axlcg.com/warmImportRequests fromBs4ImportBeautifulSoupImportUrllib.requestallurl=[]img=[]count= 0#伪装成浏览器defDownload_page (URL):returnRequests.get (URL, headers={' User-agent ':' mozilla/5.0 (Windows NT 10.0; Win64; x64) applewebkit/537.36 (khtml, like Gecko) chrome/63.0.3236.0 safari/537.36 '}). Content# Crawl All the URLs of the Atlas and put them in a listdefGet_all_url (): Firsturl= "http://www.axlcg.com/wmxz/"PageIndex= 0     while 1  andPageIndex<  -: Allurldigit=Download_page (Firsturl)# first Page formattingAllsoup=BeautifulSoup (Allurldigit)# get parsed HTMLAllpage=Allsoup.find (' ul ', Attrs={' class ':' Homeboy-ul clearfix line-dot '}) Allpage2=Allpage.find_all (' A ') forAllpage2indexinchAllpage2:allpage3=allpage2index[' href ']ifAllpage3 not inchAllurl:allurl.append (Allpage3)# Find the URL of the next pageNext_page1=Allsoup.find (' ul ', Attrs={' class ':' Information-page-ul clearfix '}) Next_page2=Next_page1.find_all (' Li ') forNext_page2_indexinchNext_page2:# Print (next_page2)Next_page3=Next_page2_index.find (' A ')# Print (next_page3)            ifNext_page3.gettext ()== "Next Page"  andNext_page3.get ("href")!= None: Firsturl=Next_page3.get ("href") pageindex=PageIndex+ 1                Print("Total Page" +Firsturl)Print(Allurl)Print(Len(Allurl))# Download images for each URLdefMain (): Get_all_url ();I=  thePageCount= 0;  # up to eight pagesIndex= 0Url=Download_page (Allurl[i]) soup=BeautifulSoup (URL) i=I+ 1     whileIndex<  +  andI< Len(Allurl):# Print (allpage)        # Print (soup)Page0=Soup.find ("Div", Attrs={' class ':' Slidebox-detail '})# Print (PAGE0)Page=Page0.find_all ("Li")# Print (page)         forPageIndexinchPage:page2=Pageindex.find ("img");            # Print (Page2)Img.append (page2[' src '])Next =Soup.find (' ul ', Attrs={' class ':' Information-page-ul clearfix '}) next2= Next. Find_all (' Li ') forNext_urlinchNEXT2:# Print (next_url)Next_page=Next_url.find ("a")if(PageCount< 7  andNext_page.gettext ()== "Next Page"  andNext_page!= None  andNext_page.get ("href")!= None):# Print (Next_page.get ("href"))Url=Next_page.get (' href ') PageCount=PageCount+ 1Url=Download_page (URL) soup=BeautifulSoup (URL) Break;            elif(PageCount>= 7): URL=Download_page (Allurl[i]) soup=BeautifulSoup (URL) pagecount= 0                Print(Len(IMG)) Download ()Print("New Page" +Allurl[i]) I=I+ 1                 BreakdefDownload ():#print (Len (img))    GlobalImg,countPrint("Start downloading pictures") forMinchImg:urllib.request.urlretrieve (M,"d:/desktop//632/" + Str(count)+ ". jpg") Count=Count+1        Print("Downloading section"+Str(count)+"Zhang") img=[]Print("Download Complete")if __name__ == ' __main__ ': Main ()#download ();

Python climb heating enjoy pictures

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.