Idle to be prepared to write a crawler to crawl their own QQ space of all said say and pictures-. -
First prepare to work, enter the mobile phone version of QQ space, Analysis page:
We found that the mobile version of the paging mode is to use the Waterfall Flow page (see more), rather than the traditional paging mode, so we need to analyze the click "View More" when the request sent:
As you can see, the xhr in the red box above is the one that was sent when you clicked "View more" and we further analyzed:
, the request URL in the Red box and request headers are the information we need, first we add the header headers to the code:
1headers = {2 'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',3 'accept-encoding':'gzip, deflate, BR',4 'Accept-language':'zh-cn,zh;q=0.8',5 'Cache-control':'max-age=0',6 'Cookies':'xxxxxx',7 'upgrade-insecure-requests':'1',8 'user-agent':'mozilla/5.0 (Linux; Android 6.0; Nexus 5 build/mra58n) applewebkit/537.36 (khtml, like Gecko) chrome/60.0.3112.113 Mobile safari/537.36'9}
Because the cookie is too long omitted here.
Further analysis of the request URL:
After analysis, the key lies in the red line of two places, the number of%3d behind the representative said the number, the range of 0-1758 (I have a total of 1758 said-. -), count represents the number of performs requests loaded, and the test learns that the maximum is count=40.
To access this URL, we find that we have returned a page full of JSON data :
Thus, we can probably determine the crawler's writing ideas, access to this URL, the maximum load of 40 units, loop to 1758, parse each access to the resulting JSON data can crawl our space to say say and picture information.
QQ Space Python crawler (1)---website analysis