Python crawlers crawl kuaishou videos for multi-thread download, and python kuaishou

Source: Internet
Author: User

Python crawlers crawl kuaishou videos for multi-thread download, and python kuaishou

Environment: python 2.7 + win10

Tool: fiddler postman Android Simulator

First, open fiddler, and fiddler is used as an http/https packet capture artifact, which is not described here.

Allow https

 

Configure to allow remote connection, that is, enable http Proxy

 

Computer ip Address: 192.168.1.110

Then make sure that the mobile phone and the computer are in a LAN and can communicate with each other. Because I don't have an Android phone here, I use an android simulator instead, with the same effect.

Open the mobile browser and enter 192.168.1.110: 8888, that is, the configured proxy address. After the certificate is installed, packets can be captured.

 

After the certificate is installed, manually specify the http proxy in the WiFi settings Change Network

 

After saving, you can use fiddler to capture app data and refresh the app. You can see that many http requests come in. Generally, the interface address and the like are obvious, the data type is json.

 

In an http post request, the returned data is in json format. After expansion, a total of 20 videos are found. Check whether the information is correct and find a video link.

 

OK indicates that the video can be played very cleanly without watermarks.

Now open postman to simulate this post to see if there are any validation parameters.

 

There are so many parameters in total, I thought client_key and sign will verify... but later I found that I was wrong and did not verify anything, so I just submitted it...

An error is returned when the form-data method is submitted.

 

For raw

 

The error message is different. Add headers.

 

Nice successfully returned data. I tried it several times and found that each return results were different. They were all 20 videos, in the previous post parameter, a page = 1 may always be the first page, just as if it had been on the mobile phone and started to refresh without turning it down. It doesn't matter anyway, as long as no duplicate data is returned.

The following code begins:

#-*-Coding: UTF-8-*-# author: Corleoneimport urllib2, urllibimport json, OS, re, socket, time, sysimport Queueimport threadingimport logging # Log Module logger = logging. getLogger ("AppName") formatter = logging. formatter ('% (asctime) s % (levelname)-5 s: % (message) s') console_handler = logging. streamHandler (sys. stdout) lele_handler.formatter = formatterlogger. addHandler (lele_handler) logger. setLevel (logging. INFO) Video_q = Queue. Queue () # Video Queue def get_video (): url = "http: // 101.251.217.210/rest/n/feed/hot? App = 0 & lon = 121.372027 & c = BOYA_BAIDU_PINZHUAN & sys = ANDROID_4.1.2 & mod = HUAWEI (HUAWEI % 20C8813Q) & did = login & ver = 5.4 & net = WIFI & country_code = cn & iuid = & appver = 5.4.7.5559 & max_memory = 128 & oc = BOYA_BAIDU_PINZHUAN & ftt = & ud = 0 & language = zh-cn & lat = 31.319303 "data = {'type ': 7, 'page': 2, 'ldstart': 'false', 'Count': 20, 'pv': 'false', 'id': 5, 'refreshtimes ': 4, 'pcursor ': 1,' OS ': 'android', 'client _ key': '3c2cd3f3', 'sig ': 'shanghai'} req = urllib2.Request (url) req. add_header ("User-Agent", "kwai-android") req. add_header ("Content-Type", "application/x-www-form-urlencoded") params = urllib. urlencode (data) try: html = urllib2.urlopen (req, params ). read () handle T urllib2.URLError: logger. warning (u "Network instability is re-accessing") html = urllib2.urlopen (req, params ). read () result = json. loads (html) reg = re. compile (u "[\ u4e00-\ u9fa5] +") # Only match Chinese characters for x in result ['feeds ']: try: title = x ['caption']. replace ("\ n", "") name = "". join (reg. findall (title) video_q.put ([name, x ['photo _ id'], x ['main _ mv_urls '] [0] ['url']) handle T KeyError: passdef download (video_q): path = u "D: \ kuaishou" while True: data = video_q.get () name = data [0]. replace ("\ n", "") id = data [1] url = data [2] file = OS. path. join (path, name + ". mp4 ") logger.info (u" is downloading: % s "% name) try: urllib. urlretrieve (url, file) failed t IOError: file = OS. path. join (path, u "neurology" + 'audio s.mp4 ') % id try: urllib. urlretrieve (url, file) failed T (socket. error, urllib. contentTooShortError): logger. warning (u "request disconnected, sleep for 2 seconds") time. sleep (2) urllib. urlretrieve (url, file) logger.info (u "download completed: % s" % name) video_q.task_done () def main (): # use help try: threads = int (sys. argv [1]) failed T (IndexError, ValueError): print u "\ n usage:" + sys. argv [0] + u "[number of threads: 10] \ n" print u "Example:" + sys. argv [0] + "10" + u "video crawling enable 10 threads to crawl about 2000 videos once a day (separated by spaces)" return False # determine the directory if OS. path. exists (u'd: \ kuaishou ') = False: OS. makedirs (u 'd: \ kuaishou ') # parse the webpage logger.info (u "crawling webpage") for x in range (1,100 ): logger.info (u "% s request" % x) get_video () num = video_q.qsize () logger.info (u "% s video" % num) # Multithread download for y in range (threads): t = threading. thread (target = download, args = (video_q,) t. setDaemon (True) t. start () video_q.join () logger.info (u "----------- all crawled ---------------") main ()

Test

 

Multi-thread download: downloads about 2000 videos to D: \ kuaishou by default.

 

Now, this is the end. It's actually quite simple. kuaishou is not encrypted... This is because the system encountered a problem when it was shaking .....

Summary

The above is the multi-thread download of the python crawler crawling kuaishou video introduced by xiaobian. I hope it will help you. If you have any questions, please leave a message and I will reply to you in a timely manner. Thank you very much for your support for the help House website!

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.