First download a bucket of fish (not download can also, the URL is here, right?)
Grab the packet, crawl to a JSON packet, get the address below
Observation test shows that by modifying the value of offset is equivalent to the page of the app
To access this URL, the return is a large dictionary, a dictionary of two indexes, an error, a data. and data is an array of length 20, and each array is a dictionary. There is another index in each dictionary, vertical_src.
It's our goal!
1 Import Urllib.parse2 Import Urllib3 Import JSON4 Import Urllib.request5data_info={}6data_info['type']='AUTO'7data_info['DOCTYPE']='JSON'8data_info['xmlversion']='1.6'9data_info['UE']='UTF-8'Tendata_info['Typoresult']='true' Onehead_info={} Ahead_info['user-agent']='dyzb/2.271 (iphone; IOS 9.3.2; scale/3.00)' -Url='http://capi.douyucdn.cn/api/v1/getVerticalRoom?aid=ios&client_sys=ios&limit=20&offset=20' -Data_info=urllib.parse.urlencode (Data_info). Encode ('Utf-8') the print (data_info) -requ=urllib.request.Request (url,data_info) -Requ.add_header ('Referer','http://capi.douyucdn.cn') -Requ.add_header ('user-agent','dyzb/2.271 (iphone; IOS 9.3.2; scale/3.00)') +Response=Urllib.request.urlopen (requ) - Print (response) +Html=response.read (). Decode ('Utf-8')
This just over 20 lines of code will return the JSON data. Then, by slicing the JSON code, the URL address of each host photo is separated.
And get a picture of this page.
1 ImportJSON2 Importurllib.request3data_info={}4data_info['type']='AUTO'5data_info['DOCTYPE']='JSON'6data_info['xmlversion']='1.6'7data_info['UE']='UTF-8'8data_info['Typoresult']='true'9 Ten OneURL+STR (i) ='http://capi.douyucdn.cn/api/v1/getVerticalRoom?aid=ios&client_sys=ios&limit=20&offset='+str (x) AData_info=urllib.parse.urlencode (Data_info). Encode ('Utf-8') - Print(Data_info) -requ=urllib.request.Request (url,data_info) theRequ.add_header ('Referer','http://capi.douyucdn.cn') -Requ.add_header ('user-agent','dyzb/2.271 (iphone; IOS 9.3.2; scale/3.00)') -Response=Urllib.request.urlopen (requ) - Print(response) +Html=response.read (). Decode ('Utf-8') - " " + Print (Type (dictionary)) A print (Type (Dictionary[data])) at " " -dictionary=json.loads (HTML) -data_arr=dictionary["Data"] - forIinchRange (0,19): -name=data_arr[i]["Nickname"] -img_url=data_arr[i]["vertical_src"] in Print(Type (img_url)) -respon_tem=Urllib.request.urlopen (Img_url) toanchor_img=Respon_tem.read () +With open ('.. /photos/'+name+'. jpg','WB') as F: -F.write (ANCHOR_IMG)
Then modify it so that it has the ability to turn the page
1 ImportUrllib.parse2 ImportUrllib3 ImportJSON4 Importurllib.request5data_info={}6data_info['type']='AUTO'7data_info['DOCTYPE']='JSON'8data_info['xmlversion']='1.6'9data_info['UE']='UTF-8'Tendata_info['Typoresult']='true' OneData_info=urllib.parse.urlencode (Data_info). Encode ('Utf-8') A - forXinchRange (0,195): -Url='http://capi.douyucdn.cn/api/v1/getVerticalRoom?aid=ios&client_sys=ios&limit=20&offset='+str (x) the Print(Data_info) -requ=urllib.request.Request (url,data_info) -Requ.add_header ('Referer','http://capi.douyucdn.cn') -Requ.add_header ('user-agent','dyzb/2.271 (iphone; IOS 9.3.2; scale/3.00)') +Response=Urllib.request.urlopen (requ) - Print(response) +Html=response.read (). Decode ('Utf-8') Adictionary=json.loads (HTML) atdata_arr=dictionary["Data"] - forIinchRange (0,19): -name=data_arr[i]["Nickname"] -img_url=data_arr[i]["vertical_src"] - Print(Type (img_url)) -respon_tem=Urllib.request.urlopen (Img_url) inanchor_img=Respon_tem.read () -With open ('.. /photos/'+name+'. jpg','WB') as F: toF.write (ANCHOR_IMG)
And then just wait.
It's best to set the time, how often you crawl, or how often you change the IP. It's all right.
Crawling app photos with Python