QQ Space Python crawler (1)---website analysis

Source: Internet
Author: User

Idle to be prepared to write a crawler to crawl their own QQ space of all said say and pictures-. -

First prepare to work, enter the mobile phone version of QQ space, Analysis page:

We found that the mobile version of the paging mode is to use the Waterfall Flow page (see more), rather than the traditional paging mode, so we need to analyze the click "View More" when the request sent:

As you can see, the xhr in the red box above is the one that was sent when you clicked "View more" and we further analyzed:

, the request URL in the Red box and request headers are the information we need, first we add the header headers to the code:

1headers = {2     'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',3     'accept-encoding':'gzip, deflate, BR',4     'Accept-language':'zh-cn,zh;q=0.8',5     'Cache-control':'max-age=0',6     'Cookies':'xxxxxx',7     'upgrade-insecure-requests':'1',8     'user-agent':'mozilla/5.0 (Linux; Android 6.0; Nexus 5 build/mra58n) applewebkit/537.36 (khtml, like Gecko) chrome/60.0.3112.113 Mobile safari/537.36'9}

Because the cookie is too long omitted here.

Further analysis of the request URL:

After analysis, the key lies in the red line of two places, the number of%3d behind the representative said the number, the range of 0-1758 (I have a total of 1758 said-. -), count represents the number of performs requests loaded, and the test learns that the maximum is count=40.

To access this URL, we find that we have returned a page full of JSON data :

Thus, we can probably determine the crawler's writing ideas, access to this URL, the maximum load of 40 units, loop to 1758, parse each access to the resulting JSON data can crawl our space to say say and picture information.

  

QQ Space Python crawler (1)---website analysis

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.