Python crawlers capture data transmitted by mobile apps and python crawlers capture apps
Most apps return json data or a bunch of encrypted data. The super curriculum APP is used as an example to capture the topics that users send in the super curriculum.
1. Capture APP data packets
For details about the method, refer to this blog post: How does Fiddler capture mobile APP data packets?
Get the supercourse logon address: http: // 120.55.151.61/V2/StudentSkip/loginCheckV4.action
Form:
The form contains the username and password. Of course, they are all encrypted, and there is a device information. Directly post the information.
In addition, the header must be added. At first, I did not add the header to get a logon error, so I need to include the header information.
2. log on
Logon code:
import urllib2from cookielib import CookieJarloginUrl = 'http://120.55.151.61/V2/StudentSkip/loginCheckV4.action'headers = {'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8','User-Agent': 'Dalvik/1.6.0 (Linux; U; Android 4.1.1; M040 Build/JRO03H)','Host': '120.55.151.61','Connection': 'Keep-Alive','Accept-Encoding': 'gzip','Content-Length': '207',}loginData = 'phoneBrand=Meizu&platform=1&deviceCode=868033014919494&account=FCF030E1F2F6341C1C93BE5BBC422A3D&phoneVersion=16&password=A55B48BB75C79200379D82A18C5F47D6&channel=MXMarket&phoneModel=M040&versionNumber=7.2.1&'cookieJar = CookieJar()opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookieJar))req = urllib2.Request(loginUrl, loginData, headers)loginResult = opener.open(req).read()print loginResult
If the logon succeeds, a string of json data of account information is returned.
Similar to the data returned during packet capture, the logon is successful.
3. Capture Data
Obtain the url and post parameters of the topic in the same way.
The method is the same as the simulated login website. For details, see Python crawler simulated logon website with verification code.
See the final code below, including homepage retrieval and pull-down loading and updating. You can load the topic content without limit.
#! /Usr/local/bin/python2.7 #-*-coding: utf8-*-"Super curriculum topic capture" import urllib2from cookielib import CookieJarimport json ''' read Json ''' def fetch_data (json_data ): data = json_data ['data'] timestampLong = data ['timestamplong '] messageBO = data ['messagebos'] topicList = [] for each in messageBO: topicDict = {} if each. get ('content', False): topicDict ['content'] = each ['content'] topicDict ['schoolname'] = each ['SC Hoolname'] topicDict ['messageid'] = each ['messageid'] topicDict ['gender'] = each ['studentbo'] ['gender'] topicDict ['time'] = each ['issuetime'] print each ['schoolname'], each ['content'] topicList. append (topicDict) return timestampLong, topicList ''' load more ''' def load (timestamp, headers, url ): headers ['content-length'] = '000000' loadData = 'timestamp = % s & phoneBrand = Meizu & platform = 1 & genderType =-1 & topicId = 19 & phoneVe Rsion = 16 & selectType = 3 & channel = MXMarket & phoneModel = M040 & versionNumber = 7.2.1 & '% timestampreq = urllib2.Request (url, loadData, headers) loadResult = opener. open (req ). read () loginStatus = json. loads (loadResult ). get ('status', False) if loginStatus = 1: print 'Load successful! 'Timestamp, topicList = fetch_data (json. loads (loadResult) load (timestamp, headers, url) else: print 'Load failed' print loadResultreturn FalseloginUrl = 'HTTP: // 120.55.151.61/V2/StudentSkip/loginCheckV4.action 'topicurl = 'HTTP: // 120.55.151.61/V2/Treehole/Message/comment 'headers = {'content-type ': 'application/x-www-form-urlencoded; charset = UTF-8 ', 'user-agent': 'dalvik/1.6.0 (Linux; U; Android 4.1.1; M040 Build/JRO03H) ', 'host': '2017. 55.151.61 ', 'connection': 'Keep-alive', 'Accept-Encoding': 'gzip ', 'content-length': '123 ',} ''' --- logon --- ''' loginData = 'phonebrand = Meizu & platform = 1 & deviceCode = 868033014919494 & account = login & phoneVersion = 16 & password = a55b48bb75c79200107d82a18c5f47d6 & channel = MXMarket & phoneModel = M040 & versionNumber = 7.2.1 & 'cookiejar = cookieJar () Opener = urllib2.build _ opener (urllib2.HTTPCookieProcessor (cookieJar) req = urllib2.Request (loginUrl, loginData, headers) loginResult = opener. open (req ). read () loginStatus = json. loads (loginResult ). get ('data', False) if loginResult: print 'login successful! 'Else: print 'login fail 'print loginResult ''' --- obtain topic --- ''' topicData = 'timestamp' = 0 & phoneBrand = Meizu & platform = 1 & genderType =-1 & topicId = 19 & phoneVersion = 16 & selectType = 3 & channel = MXMarket & phoneModel = M040 & versionNumber = 7.2.1 & 'headers ['content-length'] = '66661' topicRequest = urllib2.Request (topicUrl, topicData, headers) topicHtml = opener. open (topicRequest ). read () topicJson = json. loads (topicHtml) topicStatus = TopicJson. get ('status', False) print topicJsonif topicStatus = 1: print 'fetch topic success! 'Timestamp, topicList = fetch_data (topicJson) load (timestamp, headers, topicUrl)
Result:
Articles you may be interested in:
- How does Fiddler capture mobile APP data packets