Most apps return JSON-formatted data, or a bunch of encrypted data. Take the Super Curriculum app as an example to grab the user's topic in the Super curriculum.
1. Crawl the app data package
For more information, refer to this blog post: Fiddler How to crawl mobile app packets
Get the Super Curriculum login address: http://120.55.151.61/V2/StudentSkip/loginCheckV4.action
Form:
The form includes the user name and password, of course, is encrypted, there is a device information, direct post in the past is.
Also must add the header, at first I did not add the header to get a login error, so bring the header information.
2. Login
Login code:
Import urllib2from cookielib Import cookiejarloginurl = ' http://120.55.151.61/V2/StudentSkip/loginCheckV4.action ' headers = {' Content-type ': ' application/x-www-form-urlencoded; Charset=utf-8 ', ' user-agent ': ' dalvik/1.6.0 (Linux; U Android 4.1.1; M040 build/jro03h) ', ' Host ': ' 120.55.151.61 ', ' Connection ': ' keep-alive ', ' accept-encoding ': ' gzip ', ' content-length ': ' 207 ',}logindata = ' phonebrand=meizu&platform=1&devicecode=868033014919494&account= Fcf030e1f2f6341c1c93be5bbc422a3d&phoneversion=16&password=a55b48bb75c79200379d82a18c5f47d6&channel =mxmarket&phonemodel=m040&versionnumber=7.2.1& ' Cookiejar = Cookiejar () opener = Urllib2.build_opener ( Urllib2. Httpcookieprocessor (Cookiejar)) req = Urllib2. Request (loginurl, Logindata, headers) Loginresult = Opener.open (req). Read () Print Loginresult
Log on successfully returns a string of JSON data for the account information
The same as when the packet is returned, proving that the login was successful
3. Crawl Data
Get the URL and post parameters for the topic in the same way
It's just like impersonating a login site. See: Python crawler demo login with Captcha website
See below for the final code, with homepage fetching and drop-down loading updates. The topic content can be loaded indefinitely.
#!/usr/local/bin/python2.7#-*-Coding:utf8-*-"" "Super Curriculum Topic Crawl" "" Import urllib2from cookielib import Cookiejarimport JSON ' read JSON data ' Def fetch_data (json_data):d ata = json_data[' data ']timestamplong = data[' timestamplong ']messagebo = data[' messagebos ']topiclist = []for each in messagebo:topicdict = {}if each.get (' content ', False): topicdict[' content '] = each[' content ']topicdict[' schoolname '] = each[' schoolname ']topicdict[' messageId '] = each[' messageId ']topicdict[' Gender '] = each[' Studentbo ' [' Gender ']topicdict[' time '] = each[' issuetime ']print each[' schoolname '],each[' content '] Topiclist.append (topicdict) return timestamplong, topiclist "load More" def load (timestamp, headers, url): headers[' Content-length '] = ' 159 ' LoadData = ' timestamp=%s&phonebrand=meizu&platform=1&gendertype=-1&topicid =19&phoneversion=16&selecttype=3&channel=mxmarket&phonemodel=m040&versionnumber=7.2.1& '% Timestampreq = urllib2. Request (URL, loaddata, headers) Loadresult = Opener.open(req). Read () LoginStatus = Json.loads (Loadresult). Get (' status ', False) if loginStatus = = 1:print ' Load successful! ' Timestamp, topiclist = Fetch_data (Json.loads (Loadresult)) load (timestamp, headers, url) else:print ' Load fail ' Print Loadresultreturn falseloginurl = ' http://120.55.151.61/V2/StudentSkip/loginCheckV4.action ' topicurl = ' http// 120.55.151.61/v2/treehole/message/getmessagebytopicidv3.action ' headers = {' Content-type ': ' application/ x-www-form-urlencoded; Charset=utf-8 ', ' user-agent ': ' dalvik/1.6.0 (Linux; U Android 4.1.1; M040 build/jro03h) ', ' Host ': ' 120.55.151.61 ', ' Connection ': ' keep-alive ', ' accept-encoding ': ' gzip ', ' content-length ': ' 207 ',} '---login section---' ' Logindata = ' phonebrand=meizu&platform=1&devicecode=868033014919494&account= Fcf030e1f2f6341c1c93be5bbc422a3d&phoneversion=16&password=a55b48bb75c79200379d82a18c5f47d6&channel =mxmarket&phonemodel=m040&versionnumber=7.2.1& ' Cookiejar = Cookiejar () opener = Urllib2.build_opener ( Urllib2. HttpcookiepRocessor (Cookiejar)) req = Urllib2. Request (loginurl, Logindata, headers) Loginresult = Opener.open (req). Read () LoginStatus = Json.loads (Loginresult). Get ( ' Data ', False) if Loginresult:print ' login successful! ' Else:print ' login fail ' Print Loginresult '---get the topic---' Topicdata = ' timestamp=0&phonebrand=meizu&platform= 1&gendertype=-1&topicid=19&phoneversion=16&selecttype=3&channel=mxmarket&phonemodel= m040&versionnumber=7.2.1& ' headers[' content-length '] = ' 147 ' topicrequest = urllib2. Request (Topicurl, Topicdata, headers) topichtml = Opener.open (topicrequest). Read () Topicjson = Json.loads (topichtml) Topicstatus = Topicjson.get (' status ', False) print topicjsonif topicstatus = = 1:print ' fetch topic success! ' Timestamp, topiclist = Fetch_data (topicjson) load (timestamp, headers, topicurl)
Results: