How to use Python to crawl content inside JS

Source: Internet
Author: User

This article to share the content is 3 using Python how to crawl the contents of JS, has a certain reference value, the need for friends can refer to

First, when writing the crawler software to obtain the required content may encounter the required content is added by JavaScript, when the acquisition is empty when we get Sina News comments on the use of ordinary methods can not be obtained


Common Get code example:

Import requestsfrom bs4 Import beautifulsoupres = Requests.get (' http://news.sina.com.cn/c/nd/2017-06-12/ Doc-ifyfzhac1650783.shtml ') res.encoding = ' utf-8 ' soup = BeautifulSoup (res.text, ' html.parser ') #取评论数commentCount = Soup.select_one (' #commentCount1 ') print (Commentcount.text)

The result obtained at this time is empty because the content is stored in the JS file

So we need to look for the JS to store the comment content to find that we found it stored in the change JS

Put the content into the JSON data viewer we found that the total number of comments and comments in the JS file in a JSON format to store



In the message header we can see the JS file access path and request method

code example

Import jsoncomments = Requests.get (' http://comment5.news.sina.com.cn/page/info?version=1&format=js& channel=gn&newsid=comos-fyfzhac1650783 ') comments.encoding = ' utf-8 ' Print (comments) JD = Json.loads ( Comments.text.strip (' var data= ')) #移除改var data= turn it into JSON data print (jd[' result ' [' count '] [' total '])


Note: This explains why you need to remove Var data= because the string prefix is included with Var data= when it is not in the JSON data format, so you need to remove it from the request content when you convert it.

Why use the jd[' result '[' count '] [' Total'] when you take comments


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.