Front of the front
rookie one betel pepper juice vegetableschinese A light spray not to be ridiculed BG
And no BG, forced to get a point
Where do you come to the point of the matter.
The computer room is 7-24 open, thinking of using it to pick up a few things.
Then I went to the P station when I suddenly think of their three-digit collection of pictures can be caught down.
Wrote two job-free nights, PS:
P Station Frequent Abnormal login is feedback to the mailbox, careful mailbox is brushed into 999+ login
This is the browser F12 login can see, in order not to make the jump after emptying we need to hook up preserve log
You can see here in addition to the login account password and a similar verification of something Postkey
Flip page source code to see
<input type= "hidden" name= "Post_key" value= "Cb02a4460fd7a41fb46d0129e4ca5ece" >
Access to PIXIV is to be a referer (not translate), according to personal understanding, it represents the jump before the URL is what
This thing must have to be very important or not access to the original picture's favorites list
The first is the collection of URLs
To make a reptile more like a reptile, we need to look for a tag with the next class in each page.
<span class= "Next" ><a href= "rest=show&p=2" rel= "Next" class= "_button" title= "next Page" >
Such a thing, inside of a is pointing to the next page.
If you can't find the next page, that's the end of the picture .
Each page of the picture has many different resolution addresses, with ID 59345668 picture, the original picture in this line
That's the one.
And then the match was made, and the note was that there was no such a comic.
There are two ways to save 1 for a picture.
Urllib.urlretrieve (URL, path, rollfunc) #url是图片地址, path is the local directory, that Rollfunc is yy out of the download progress of Things
That is to climb the Baidu Post bar used to 2.
With open (path, WB) as File:
file.write (Requests.get (URL). Content) #url同上, path ibid, w is write B is in binary way
There's only a second way out of here, because come on. Multithreading
It's big, isn't it, it's exciting, right?
For orders of magnitude, obviously, we can get them to do it in parallel, and the speed increase is obviously
To make it easier for me to skip the thread, because threading is better.
Look at the code
Def cal (A, b):
print a + b
jobs = [] for
p in xrange (m):
jobs.append (Threading. Thread (target = cal, args = (1, 2)) for
job in jobs:
Job.start () for
job in jobs:
Job.join ()
Here we first define a list to record all the threads and then the encapsulation
Target is the function that needs to be invoked (obviously) args is the calling parameter (or obviously), args must be a tuple
Start () and join () are starting and waiting, ensuring that all threads end
Then there's the final flow.
Login 1-> Find a picture of your favorite page and load the download task into a thread-> run it-> find the next page of hyperlinks-> back to 1 strings
In the past, Pascal, C + + are direct Stra = Strb + strc, behind the python so the efficiency and low
The reason for this is that string is immutable in Python, so every time the + operation is reopened to generate a string
Search the web again, find a solution.
Stra = '. Join (STRB, STRC)
Code
The first time I wrote something long, it took a few nights to write and stop. Still have a sense of accomplishment (even if the code is ugly)
Import Threading Import requests import OS import re def getpage (HTML, URL, headers): while 1:try:
page = Html.get (URL, headers = headers, timeout = 3). Content break except Exception, E: Print e pass return page def logIn (HTML, URLs, headers, data): while 1:try:html . Post (URL, headers = headers, data = data, timeout = 2) return except Exception, E:print
E Pass def get (HTML, URL, headers, index, filepath): headers[' Referer '] = URL page = "While 1: try:page = Html.get (url, headers = headers, timeout = 3). Content Break except EXC Eption, e:print e pass reg = Re.compile (R ' ') Postkey = Re.findall (Reg, LoginPage) [0] data[' post_key '] = Postkey logIn (HTML, loginurl, headers, data) Favurl = P Refavurl index = 0 while 1:headers[' Referer '] = favurl favpage = getpage (HTML, Favurl, headers) reg = re.com
Pile (R ' "data-type=" illust "data-id=" (\d+) "data-tags=") lis = Re.findall (Reg, favpage) jobs = [] for p in LIS: url = '. Join ([Preurl, p]) Index + = 1 jobs.append (threading. Thread (target = get, args = (HTML, URLs, headers, index, filepath)) for job in Jobs:job.start () for job
In Jobs:job.join () Reg = Re.compile (R ' class= "next" ><a href= "(. +?)" ") Bacurl = Re.findall (Reg, favpage) If bacurl:p = Re.sub (' & ', ' & ', bacurl[0]) Favurl = Pre
Favurl + PPrint Favurl Else:break