Crawling app photos with Python

Last Update:2017-12-03 Source: Internet

Author: User

Tags urlencode

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

First download a bucket of fish (not download can also, the URL is here, right?)

Grab the packet, crawl to a JSON packet, get the address below

Observation test shows that by modifying the value of offset is equivalent to the page of the app

To access this URL, the return is a large dictionary, a dictionary of two indexes, an error, a data. and data is an array of length 20, and each array is a dictionary. There is another index in each dictionary, vertical_src.

It's our goal!

1 Import Urllib.parse2 Import Urllib3 Import JSON4 Import Urllib.request5data_info={}6data_info['type']='AUTO'7data_info['DOCTYPE']='JSON'8data_info['xmlversion']='1.6'9data_info['UE']='UTF-8'Tendata_info['Typoresult']='true' Onehead_info={} Ahead_info['user-agent']='dyzb/2.271 (iphone; IOS 9.3.2; scale/3.00)' -Url='http://capi.douyucdn.cn/api/v1/getVerticalRoom?aid=ios&client_sys=ios&limit=20&offset=20' -Data_info=urllib.parse.urlencode (Data_info). Encode ('Utf-8') the print (data_info) -requ=urllib.request.Request (url,data_info) -Requ.add_header ('Referer','http://capi.douyucdn.cn') -Requ.add_header ('user-agent','dyzb/2.271 (iphone; IOS 9.3.2; scale/3.00)') +Response=Urllib.request.urlopen (requ) - Print (response) +Html=response.read (). Decode ('Utf-8')

This just over 20 lines of code will return the JSON data. Then, by slicing the JSON code, the URL address of each host photo is separated.

And get a picture of this page.

1 ImportJSON2 Importurllib.request3data_info={}4data_info['type']='AUTO'5data_info['DOCTYPE']='JSON'6data_info['xmlversion']='1.6'7data_info['UE']='UTF-8'8data_info['Typoresult']='true'9 Ten OneURL+STR (i) ='http://capi.douyucdn.cn/api/v1/getVerticalRoom?aid=ios&client_sys=ios&limit=20&offset='+str (x) AData_info=urllib.parse.urlencode (Data_info). Encode ('Utf-8') - Print(Data_info) -requ=urllib.request.Request (url,data_info) theRequ.add_header ('Referer','http://capi.douyucdn.cn') -Requ.add_header ('user-agent','dyzb/2.271 (iphone; IOS 9.3.2; scale/3.00)') -Response=Urllib.request.urlopen (requ) - Print(response) +Html=response.read (). Decode ('Utf-8') - " " + Print (Type (dictionary)) A print (Type (Dictionary[data])) at " " -dictionary=json.loads (HTML) -data_arr=dictionary["Data"] -  forIinchRange (0,19): -name=data_arr[i]["Nickname"] -img_url=data_arr[i]["vertical_src"] in    Print(Type (img_url)) -respon_tem=Urllib.request.urlopen (Img_url) toanchor_img=Respon_tem.read () +With open ('.. /photos/'+name+'. jpg','WB') as F: -F.write (ANCHOR_IMG)

Then modify it so that it has the ability to turn the page

1 ImportUrllib.parse2 ImportUrllib3 ImportJSON4 Importurllib.request5data_info={}6data_info['type']='AUTO'7data_info['DOCTYPE']='JSON'8data_info['xmlversion']='1.6'9data_info['UE']='UTF-8'Tendata_info['Typoresult']='true' OneData_info=urllib.parse.urlencode (Data_info). Encode ('Utf-8') A  -  forXinchRange (0,195): -Url='http://capi.douyucdn.cn/api/v1/getVerticalRoom?aid=ios&client_sys=ios&limit=20&offset='+str (x) the     Print(Data_info) -requ=urllib.request.Request (url,data_info) -Requ.add_header ('Referer','http://capi.douyucdn.cn') -Requ.add_header ('user-agent','dyzb/2.271 (iphone; IOS 9.3.2; scale/3.00)') +Response=Urllib.request.urlopen (requ) -     Print(response) +Html=response.read (). Decode ('Utf-8') Adictionary=json.loads (HTML) atdata_arr=dictionary["Data"] -      forIinchRange (0,19): -name=data_arr[i]["Nickname"] -img_url=data_arr[i]["vertical_src"] -         Print(Type (img_url)) -respon_tem=Urllib.request.urlopen (Img_url) inanchor_img=Respon_tem.read () -With open ('.. /photos/'+name+'. jpg','WB') as F: toF.write (ANCHOR_IMG)

And then just wait.

It's best to set the time, how often you crawl, or how often you change the IP. It's all right.

Crawling app photos with Python

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More