Python crawlers capture data transmitted by mobile apps

Source: Internet
Author: User
Most apps return json data or a bunch of encrypted data. The super curriculum APP is used as an example to capture topics sent by users in the super curriculum. Most apps return json data or a bunch of encrypted data. The super curriculum APP is used as an example to capture the topics that users send in the super curriculum.

1. Capture APP data packets

For details about the method, refer to this blog post: How does Fiddler capture mobile APP data packets?

Get the supercourse logon address: http: // 120.55.151.61/V2/StudentSkip/loginCheckV4.action

Form:

The form contains the username and password. Of course, they are all encrypted, and there is a device information. Directly post the information.

In addition, the header must be added. At first, I did not add the header to get a logon error, so I need to include the header information.

2. log on

Logon code:

import urllib2from cookielib import CookieJarloginUrl = 'http://120.55.151.61/V2/StudentSkip/loginCheckV4.action'headers = {'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8','User-Agent': 'Dalvik/1.6.0 (Linux; U; Android 4.1.1; M040 Build/JRO03H)','Host': '120.55.151.61','Connection': 'Keep-Alive','Accept-Encoding': 'gzip','Content-Length': '207',}loginData = 'phoneBrand=Meizu&platform=1&deviceCode=868033014919494&account=FCF030E1F2F6341C1C93BE5BBC422A3D&phoneVersion=16&password=A55B48BB75C79200379D82A18C5F47D6&channel=MXMarket&phoneModel=M040&versionNumber=7.2.1&'cookieJar = CookieJar()opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookieJar))req = urllib2.Request(loginUrl, loginData, headers)loginResult = opener.open(req).read()print loginResult

If the logon succeeds, a string of json data of account information is returned.

Similar to the data returned during packet capture, the logon is successful.

3. Capture Data

Obtain the url and post parameters of the topic in the same way.

The method is the same as the simulated login website. For details, see Python crawler simulated logon website with verification code.

See the final code below, including homepage retrieval and pull-down loading and updating. You can load the topic content without limit.

#! /Usr/local/bin/python2.7 #-*-coding: utf8-*-"Super curriculum topic capture" import urllib2from cookielib import CookieJarimport json ''' read Json ''' def fetch_data (json_data ): data = json_data ['data'] timestampLong = data ['timestamplong '] messageBO = data ['messagebos'] topicList = [] for each in messageBO: topicDict = {} if each. get ('content', False): topicDict ['content'] = each ['content'] topicDict ['schoolname'] = each ['SC Hoolname'] topicDict ['messageid'] = each ['messageid'] topicDict ['gender'] = each ['studentbo'] ['gender'] topicDict ['time'] = each ['issuetime'] print each ['schoolname'], each ['content'] topicList. append (topicDict) return timestampLong, topicList ''' load more ''' def load (timestamp, headers, url ): headers ['content-length'] = '000000' loadData = 'timestamp = % s & phoneBrand = Meizu & platform = 1 & genderType =-1 & topicId = 19 & phoneVe Rsion = 16 & selectType = 3 & channel = MXMarket & phoneModel = M040 & versionNumber = 7.2.1 & '% timestampreq = urllib2.Request (url, loadData, headers) loadResult = opener. open (req ). read () loginStatus = json. loads (loadResult ). get ('status', False) if loginStatus = 1: print 'Load successful! 'Timestamp, topicList = fetch_data (json. loads (loadResult) load (timestamp, headers, url) else: print 'Load fail' print loadResultreturn FalseloginUrl =' http://120.55.151.61/V2/StudentSkip/loginCheckV4.action 'Topicurl =' http://120.55.151.61/V2/Treehole/Message/getMessageByTopicIdV3.action 'Headers = {'content-type': 'application/x-www-form-urlencoded; charset = UTF-8 ', 'user-agent': 'dalvik/1.6.0 (Linux; u; Android 4.1.1; M040 Build/JRO03H) ', 'host': '2017. 55.151.61 ', 'connection': 'Keep-alive', 'Accept-Encoding': 'gzip ', 'content-length': '123 ',} ''' --- logon --- ''' loginData = 'phonebrand = Meizu & platform = 1 & deviceCode = 868033014919494 & account = FCF030E1F2F6341C1C93BE5BBC422A3D & phoneVersion = 16 & pas Sword = Export & channel = MXMarket & phoneModel = M040 & versionNumber = 7.2.1 & 'cookiejar = cookieJar () opener = urllib2.build _ opener (Recipe (CookieJar) req = urllib2.Request (loginUrl, loginData, headers) loginResult = opener. open (req ). read () loginStatus = json. loads (loginResult ). get ('data', False) if loginResult: print 'login successful! 'Else: print 'login fail 'print loginResult ''' --- obtain topic --- ''' topicData = 'timestamp' = 0 & phoneBrand = Meizu & platform = 1 & genderType =-1 & topicId = 19 & phoneVersion = 16 & selectType = 3 & channel = MXMarket & phoneModel = M040 & versionNumber = 7.2.1 & 'headers ['content-length'] = '66661' topicRequest = urllib2.Request (topicUrl, topicData, headers) topicHtml = opener. open (topicRequest ). read () topicJson = json. loads (topicHtml) topicStatus = TopicJson. get ('status', False) print topicJsonif topicStatus = 1: print 'fetch topic success! 'Timestamp, topicList = fetch_data (topicJson) load (timestamp, headers, topicUrl)

Result:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.