Landing ofData Packet Analysis
Tools: Python-urllib2 | Firefox+firebug or Chrome, after using the browser to open the landing page, press F12 will open the developer tool by default or start Firebug, click on the network Monitoring packet, the following with a itune landing for an example.
1. Enter itunes's login address in the browser: https://itunesconnect.apple.com/itc/static/login?view=1&path=%2FWebObjects% 2FITUNESCONNECT.WOA, simultaneously press F12 to start Firebug for network monitoring.
Click "Network"-"All", you can see all the loaded page data, including html/image/css/js and so on.
Click on the "HTML", all of the information in the display is too much, you can just look at the HTML, the side of the html-url is usually we need to use URLLIB2 actual request page.
Here is the page shown in Firefox+firebug
And this is the page that Chrome's developer tool displays
Different tools to monitor the packet will make a difference, the above two comparison to see, probably estimated that the actual need to get the URL, there are only 2.
2. We click to see one of them, focusing on the corresponding parameters, header information. (The cookie field in the header information is consistent with the content in the following cookie form).
The header information is divided into the request header information and the response header information. Urlopen the header information that needs to be submitted may be part of the request header information. The fields in the response header information may be used when accessing the next page.
3. The input of the account password usually involves a POST request, we can enter a wrong account, password to try
(Enter the correct user information, will cause this page to jump, the console will be all refreshed, the previous access path is lost).
Click to see the parameters that need to be submitted (account number, password, etc.), additional header files, cookies, etc.
So, itunes Landing is probably divided into three steps:
First GET:HTTPS://ITUNESCONNECT.APPLE.COM/ITC/STATIC/LOGIN?VIEW=1&PATH=%2FWEBOBJECTS%2FITUNESCONNECT.WOA
After GET:HTTPS://IDMSA.APPLE.COM/APPLEAUTH/AUTH/SIGNIN?WIDGETKEY=22D448248055BAB0DC197C6271D738C3
Last Post:https://idmsa.apple.com/appleauth/auth/signin
The Python code for the above 3 steps is attached below, and in the early days, this is a way to complete the itunes login and most simple websites.
[Python]View Plaincopy
- # Cookies that record the whole process
- Cj=cookielib. Cookiejar ()
- Opener=urllib2.build_opener (URLLIB2. Httpcookieprocessor (CJ))
- Urllib2.install_opener (opener)
- # The first two steps get
- Response1 = Urllib2.urlopen (' <span style= "FONT-SIZE:14PX;" >https://itunesconnect.apple.com/itc/static/login?view=1&path=%2fwebobjects%2fitunesconnect.woa</ Span> '). Read ()
- Response2 = Urllib2.urlopen (' https://idmsa.apple.com/appleauth/auth/signin?widgetKey= 22d448248055bab0dc197c6271d738c3 '). Read ()
- # step three post
- Login_data = {' accountname ':self.account, ' password ': self.password, ' rememberme ': ' false '} # Account number and password
- Login_url = "<span style=" font-size:14px;" >https://idmsa.apple.com/appleauth/auth/signin</span> "
- Head = {' content-type ': ' Application/json ',# Header info
- ' user-agent ':' mozilla/5.0 (Windows NT 6.1; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/46.0.2490.86 safari/537.36 '}
- Logindata = Urllib.urlencode (login_data)
- Request = Urllib2. Request (Login_url, Logindata, head) # get web page
- Try:
- Response = Urllib2.urlopen (Request, timeout = 6)
- except:
- print response.read ()
- Sys.exit (1)
- # Complete Login, print cookie information
After the itunes site upgrade, this method does not work, because itunes adds a new authentication mechanism, usually requires the crawler to use the additional header. But the difficulty is that the structure of these headers is very complex, the middle will pass a series of JS operations, this time to the crawler to calculate the value of these headers becomes very difficult. For example, after itunes upgrade, you need to submit a header called "X-apple-i-fd-client-info", and this header we do not come out.
Summarize
The advantages and disadvantages of login verification through packet analysis are as follows.
Advantages: Fast execution speed.
Cons: Packet capture analysis is time-consuming, sometimes not sure which headers are necessary, and some of the headers are not.
In cases where the header required by the validation analysis is not available, you can consider skipping the packet analysis by using the browser to simulate clicks.
Data packet analysis of Firebug landing