JavaScript-Is there a library that can parse JS?

Source: Internet
Author: User
I want to crawl the content on the site, but a lot of content are JS generated, can I ask if there is a library to parse JS easy to crawl page JS parsing HTML library AH? such as mall product information, QQ space content. No matter what language, can quickly develop on the line, thank you

Reply content:

I want to crawl the content on the site, but a lot of content are JS generated, can I ask if there is a library to parse JS easy to crawl page JS parsing HTML library AH? such as mall product information, QQ space content. No matter what language, can quickly develop on the line, thank you

This is not only parsing JS, but also the browser kernel!

Recommended several:

    • Qtwebkit, known to have Python and C + + support
    • Phantomjs, known for JavaScript, Coffeescript, and Python support, is also a Webkit kernel
    • Slimerjs, known to have JavaScript support, Gecko kernel, and Firefox is the same, can also run on Firefox
    • Casperjs, JavaScript support is known. Two further packages on top

Feel that your problem may not necessarily be something as heavy as that.

You want to catch the page content, you know it is from JS, then this JS is from where? It could be either the page itself or the JSON of Ajax.

Find these JS that contains what you need, and then JSON to use a JSON parser, is JS words simple can also be extracted with regular.

Phantomjs maybe the best solution for you, also, Casperjs are based on PHANTOMJS so can be a useful tool to grab webpage Content created by JavaScript or Ajax

Try node. js

From your description sounds, is to grasp the page, but the content is JS production page, you use the method of scratching the page, grabbed down an empty shell, nothing. Right?

In this case, I suggest you use "headless browser", the first push upstairs said Phantomjs, it is essentially a browser, just no user interface, through programming to call, finally can and your external code to generate some interaction, to you back (the final generated) HTML, to you and so on.

Use Nodejs directly, then execute the return content.

I generally in this case, are the JS code to see themselves, find the place and then the implementation of their own, and Java seems to have a library can be executed JS code, for example, I do Sina Weibo simulation login is directly to the site JS encryption function extracted out, Then execute the results in the code, and finally the mock request is done.

  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.