Python crawler captures mobile app transfer data

Source: Internet
Author: User
Most apps return JSON-formatted data, or a bunch of encrypted data. Take the Super Curriculum app as an example to grab the user's topic in the Super curriculum.

1. Crawl the app data package

For more information, refer to this blog post: Fiddler How to crawl mobile app packets

Get the Super Curriculum login address: http://120.55.151.61/V2/StudentSkip/loginCheckV4.action

Form:

The form includes the user name and password, of course, is encrypted, there is a device information, direct post in the past is.

Also must add the header, at first I did not add the header to get a login error, so bring the header information.

2. Login

Login code:

Import urllib2from cookielib Import cookiejarloginurl = ' http://120.55.151.61/V2/StudentSkip/loginCheckV4.action ' headers = {' Content-type ': ' application/x-www-form-urlencoded; Charset=utf-8 ', ' user-agent ': ' dalvik/1.6.0 (Linux; U Android 4.1.1; M040 build/jro03h) ', ' Host ': ' 120.55.151.61 ', ' Connection ': ' keep-alive ', ' accept-encoding ': ' gzip ', ' content-length ': ' 207 ',}logindata = ' phonebrand=meizu&platform=1&devicecode=868033014919494&account= Fcf030e1f2f6341c1c93be5bbc422a3d&phoneversion=16&password=a55b48bb75c79200379d82a18c5f47d6&channel =mxmarket&phonemodel=m040&versionnumber=7.2.1& ' Cookiejar = Cookiejar () opener = Urllib2.build_opener ( Urllib2. Httpcookieprocessor (Cookiejar)) req = Urllib2. Request (loginurl, Logindata, headers) Loginresult = Opener.open (req). Read () Print Loginresult

Log on successfully returns a string of JSON data for the account information

The same as when the packet is returned, proving that the login was successful

3. Crawl Data

Get the URL and post parameters for the topic in the same way

It's just like impersonating a login site. See: Python crawler demo login with Captcha website

See below for the final code, with homepage fetching and drop-down loading updates. The topic content can be loaded indefinitely.

#!/usr/local/bin/python2.7#-*-Coding:utf8-*-"" "Super Curriculum Topic Crawl" "" Import urllib2from cookielib import Cookiejarimport JSON ' read JSON data ' Def fetch_data (json_data):d ata = json_data[' data ']timestamplong = data[' timestamplong ']messagebo = data[' messagebos ']topiclist = []for each in messagebo:topicdict = {}if each.get (' content ', False): topicdict[' content '] = each[' content ']topicdict[' schoolname '] = each[' schoolname ']topicdict[' messageId '] = each[' messageId ']topicdict[' Gender '] = each[' Studentbo ' [' Gender ']topicdict[' time '] = each[' issuetime ']print each[' schoolname '],each[' content '] Topiclist.append (topicdict) return timestamplong, topiclist "load More" def load (timestamp, headers, url): headers[' Content-length '] = ' 159 ' LoadData = ' timestamp=%s&phonebrand=meizu&platform=1&gendertype=-1&topicid =19&phoneversion=16&selecttype=3&channel=mxmarket&phonemodel=m040&versionnumber=7.2.1& '% Timestampreq = urllib2. Request (URL, loaddata, headers) Loadresult = Opener.open(req). Read () LoginStatus = Json.loads (Loadresult). Get (' status ', False) if loginStatus = = 1:print ' Load successful! ' Timestamp, topiclist = Fetch_data (Json.loads (Loadresult)) load (timestamp, headers, url) else:print ' Load fail ' Print Loadresultreturn falseloginurl = ' http://120.55.151.61/V2/StudentSkip/loginCheckV4.action ' topicurl = ' http// 120.55.151.61/v2/treehole/message/getmessagebytopicidv3.action ' headers = {' Content-type ': ' application/ x-www-form-urlencoded; Charset=utf-8 ', ' user-agent ': ' dalvik/1.6.0 (Linux; U Android 4.1.1; M040 build/jro03h) ', ' Host ': ' 120.55.151.61 ', ' Connection ': ' keep-alive ', ' accept-encoding ': ' gzip ', ' content-length ': ' 207 ',} '---login section---' ' Logindata = ' phonebrand=meizu&platform=1&devicecode=868033014919494&account= Fcf030e1f2f6341c1c93be5bbc422a3d&phoneversion=16&password=a55b48bb75c79200379d82a18c5f47d6&channel =mxmarket&phonemodel=m040&versionnumber=7.2.1& ' Cookiejar = Cookiejar () opener = Urllib2.build_opener ( Urllib2. HttpcookiepRocessor (Cookiejar)) req = Urllib2. Request (loginurl, Logindata, headers) Loginresult = Opener.open (req). Read () LoginStatus = Json.loads (Loginresult). Get ( ' Data ', False) if Loginresult:print ' login successful! ' Else:print ' login fail ' Print Loginresult '---get the topic---' Topicdata = ' timestamp=0&phonebrand=meizu&platform= 1&gendertype=-1&topicid=19&phoneversion=16&selecttype=3&channel=mxmarket&phonemodel= m040&versionnumber=7.2.1& ' headers[' content-length '] = ' 147 ' topicrequest = urllib2. Request (Topicurl, Topicdata, headers) topichtml = Opener.open (topicrequest). Read () Topicjson = Json.loads (topichtml) Topicstatus = Topicjson.get (' status ', False) print topicjsonif topicstatus = = 1:print ' fetch topic success! ' Timestamp, topiclist = Fetch_data (topicjson) load (timestamp, headers, topicurl)

Results:

  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.