Python crawlers crawl kuaishou videos for multi-thread download, and python kuaishou
Environment: python 2.7 + win10
Tool: fiddler postman Android Simulator
First, open fiddler, and fiddler is used as an http/https packet capture artifact, which is not described here.
Allow https
Configure to allow remote connection, that is, enable http Proxy
Computer ip Address: 192.168.1.110
Then make sure that the mobile phone and the computer are in a LAN and can communicate with each other. Because I don't have an Android phone here, I use an android simulator instead, with the same effect.
Open the mobile browser and enter 192.168.1.110: 8888, that is, the configured proxy address. After the certificate is installed, packets can be captured.
After the certificate is installed, manually specify the http proxy in the WiFi settings Change Network
After saving, you can use fiddler to capture app data and refresh the app. You can see that many http requests come in. Generally, the interface address and the like are obvious, the data type is json.
In an http post request, the returned data is in json format. After expansion, a total of 20 videos are found. Check whether the information is correct and find a video link.
OK indicates that the video can be played very cleanly without watermarks.
Now open postman to simulate this post to see if there are any validation parameters.
There are so many parameters in total, I thought client_key and sign will verify... but later I found that I was wrong and did not verify anything, so I just submitted it...
An error is returned when the form-data method is submitted.
For raw
The error message is different. Add headers.
Nice successfully returned data. I tried it several times and found that each return results were different. They were all 20 videos, in the previous post parameter, a page = 1 may always be the first page, just as if it had been on the mobile phone and started to refresh without turning it down. It doesn't matter anyway, as long as no duplicate data is returned.
The following code begins:
#-*-Coding: UTF-8-*-# author: Corleoneimport urllib2, urllibimport json, OS, re, socket, time, sysimport Queueimport threadingimport logging # Log Module logger = logging. getLogger ("AppName") formatter = logging. formatter ('% (asctime) s % (levelname)-5 s: % (message) s') console_handler = logging. streamHandler (sys. stdout) lele_handler.formatter = formatterlogger. addHandler (lele_handler) logger. setLevel (logging. INFO) Video_q = Queue. Queue () # Video Queue def get_video (): url = "http: // 101.251.217.210/rest/n/feed/hot? App = 0 & lon = 121.372027 & c = BOYA_BAIDU_PINZHUAN & sys = ANDROID_4.1.2 & mod = HUAWEI (HUAWEI % 20C8813Q) & did = login & ver = 5.4 & net = WIFI & country_code = cn & iuid = & appver = 5.4.7.5559 & max_memory = 128 & oc = BOYA_BAIDU_PINZHUAN & ftt = & ud = 0 & language = zh-cn & lat = 31.319303 "data = {'type ': 7, 'page': 2, 'ldstart': 'false', 'Count': 20, 'pv': 'false', 'id': 5, 'refreshtimes ': 4, 'pcursor ': 1,' OS ': 'android', 'client _ key': '3c2cd3f3', 'sig ': 'shanghai'} req = urllib2.Request (url) req. add_header ("User-Agent", "kwai-android") req. add_header ("Content-Type", "application/x-www-form-urlencoded") params = urllib. urlencode (data) try: html = urllib2.urlopen (req, params ). read () handle T urllib2.URLError: logger. warning (u "Network instability is re-accessing") html = urllib2.urlopen (req, params ). read () result = json. loads (html) reg = re. compile (u "[\ u4e00-\ u9fa5] +") # Only match Chinese characters for x in result ['feeds ']: try: title = x ['caption']. replace ("\ n", "") name = "". join (reg. findall (title) video_q.put ([name, x ['photo _ id'], x ['main _ mv_urls '] [0] ['url']) handle T KeyError: passdef download (video_q): path = u "D: \ kuaishou" while True: data = video_q.get () name = data [0]. replace ("\ n", "") id = data [1] url = data [2] file = OS. path. join (path, name + ". mp4 ") logger.info (u" is downloading: % s "% name) try: urllib. urlretrieve (url, file) failed t IOError: file = OS. path. join (path, u "neurology" + 'audio s.mp4 ') % id try: urllib. urlretrieve (url, file) failed T (socket. error, urllib. contentTooShortError): logger. warning (u "request disconnected, sleep for 2 seconds") time. sleep (2) urllib. urlretrieve (url, file) logger.info (u "download completed: % s" % name) video_q.task_done () def main (): # use help try: threads = int (sys. argv [1]) failed T (IndexError, ValueError): print u "\ n usage:" + sys. argv [0] + u "[number of threads: 10] \ n" print u "Example:" + sys. argv [0] + "10" + u "video crawling enable 10 threads to crawl about 2000 videos once a day (separated by spaces)" return False # determine the directory if OS. path. exists (u'd: \ kuaishou ') = False: OS. makedirs (u 'd: \ kuaishou ') # parse the webpage logger.info (u "crawling webpage") for x in range (1,100 ): logger.info (u "% s request" % x) get_video () num = video_q.qsize () logger.info (u "% s video" % num) # Multithread download for y in range (threads): t = threading. thread (target = download, args = (video_q,) t. setDaemon (True) t. start () video_q.join () logger.info (u "----------- all crawled ---------------") main ()
Test
Multi-thread download: downloads about 2000 videos to D: \ kuaishou by default.
Now, this is the end. It's actually quite simple. kuaishou is not encrypted... This is because the system encountered a problem when it was shaking .....
Summary
The above is the multi-thread download of the python crawler crawling kuaishou video introduced by xiaobian. I hope it will help you. If you have any questions, please leave a message and I will reply to you in a timely manner. Thank you very much for your support for the help House website!