(non) Correct posture get collection P-station illustrator _python

Source: Internet
Author: User
Front of the front

rookie one betel pepper juice vegetableschinese A light spray not to be ridiculed BG

And no BG, forced to get a point

Where do you come to the point of the matter.
The computer room is 7-24 open, thinking of using it to pick up a few things.
Then I went to the P station when I suddenly think of their three-digit collection of pictures can be caught down.
Wrote two job-free nights, PS:

P Station Frequent Abnormal login is feedback to the mailbox, careful mailbox is brushed into 999+ login

This is the browser F12 login can see, in order not to make the jump after emptying we need to hook up preserve log

You can see here in addition to the login account password and a similar verification of something Postkey
Flip page source code to see

<input type= "hidden" name= "Post_key" value= "Cb02a4460fd7a41fb46d0129e4ca5ece" >

Access to PIXIV is to be a referer (not translate), according to personal understanding, it represents the jump before the URL is what
This thing must have to be very important or not access to the original picture's favorites list

The first is the collection of URLs

To make a reptile more like a reptile, we need to look for a tag with the next class in each page.

<span class= "Next" ><a href= "rest=show&amp;p=2" rel= "Next" class= "_button" title= "next Page" >

Such a thing, inside of a is pointing to the next page.
If you can't find the next page, that's the end of the picture .

Each page of the picture has many different resolution addresses, with ID 59345668 picture, the original picture in this line

That's the one.

And then the match was made, and the note was that there was no such a comic.

There are two ways to save 1 for a picture.

Urllib.urlretrieve (URL, path, rollfunc) #url是图片地址, path is the local directory, that Rollfunc is yy out of the download progress of Things

That is to climb the Baidu Post bar used to 2.

With open (path, WB) as File:
    file.write (Requests.get (URL). Content) #url同上, path ibid, w is write B is in binary way

There's only a second way out of here, because come on. Multithreading

It's big, isn't it, it's exciting, right?
For orders of magnitude, obviously, we can get them to do it in parallel, and the speed increase is obviously
To make it easier for me to skip the thread, because threading is better.
Look at the code

Def cal (A, b):
    print a + b

jobs = [] for

p in xrange (m):
    jobs.append (Threading. Thread (target = cal, args = (1, 2)) for

job in jobs:
    Job.start () for

job in jobs:
    Job.join ()

Here we first define a list to record all the threads and then the encapsulation
Target is the function that needs to be invoked (obviously) args is the calling parameter (or obviously), args must be a tuple
Start () and join () are starting and waiting, ensuring that all threads end

Then there's the final flow.
Login 1-> Find a picture of your favorite page and load the download task into a thread-> run it-> find the next page of hyperlinks-> back to 1 strings

In the past, Pascal, C + + are direct Stra = Strb + strc, behind the python so the efficiency and low
The reason for this is that string is immutable in Python, so every time the + operation is reopened to generate a string
Search the web again, find a solution.

Stra = '. Join (STRB, STRC)
Code

The first time I wrote something long, it took a few nights to write and stop. Still have a sense of accomplishment (even if the code is ugly)

Import Threading Import requests import OS import re def getpage (HTML, URL, headers): while 1:try:
            page = Html.get (URL, headers = headers, timeout = 3). Content break except Exception, E: Print e pass return page def logIn (HTML, URLs, headers, data): while 1:try:html . Post (URL, headers = headers, data = data, timeout = 2) return except Exception, E:print
        E Pass def get (HTML, URL, headers, index, filepath): headers[' Referer '] = URL page = "While 1: try:page = Html.get (url, headers = headers, timeout = 3). Content Break except EXC Eption, e:print e pass reg = Re.compile (R '  ') Postkey = Re.findall (Reg, LoginPage) [0] data[' post_key '] = Postkey logIn (HTML, loginurl, headers, data) Favurl = P Refavurl index = 0 while 1:headers[' Referer '] = favurl favpage = getpage (HTML, Favurl, headers) reg = re.com
        Pile (R ' "data-type=" illust "data-id=" (\d+) "data-tags=") lis = Re.findall (Reg, favpage) jobs = [] for p in LIS: url = '. Join ([Preurl, p]) Index + = 1 jobs.append (threading. Thread (target = get, args = (HTML, URLs, headers, index, filepath)) for job in Jobs:job.start () for job
    In Jobs:job.join () Reg = Re.compile (R ' class= "next" ><a href= "(. +?)" ") Bacurl = Re.findall (Reg, favpage) If bacurl:p = Re.sub (' &amp; ', ' & ', bacurl[0]) Favurl = Pre
        Favurl + PPrint Favurl Else:break
 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.