Javascript-can I parse the js library?

Source: Internet
Author: User
I want to capture the content on the website, but many of the content is generated by js. Is there any library capable of parsing js that can easily capture the html library after page js parsing? Such as mall product information and QQ space content. No matter what language, you can develop it quickly. Thank you for capturing the content on the website, but many of the content is generated by js, is there any html library that can parse js to easily capture pages after js parsing? Such as mall product information and QQ space content. No matter what language, you can develop it quickly. Thank you.

Reply content:

I want to capture the content on the website, but many of the content is generated by js. Is there any library capable of parsing js that can easily capture the html library after page js parsing? Such as mall product information and QQ space content. No matter what language, you can develop it quickly. Thank you.

This is not only about parsing js, but also about the browser kernel!

Recommended:

  • QtWebKit, known to support Python and C ++
  • PhantomJS, known to support JavaScript, CoffeeScript, and Python, is also the Webkit Kernel
  • SlimerJS, known to support JavaScript, Gecko kernel, is the same as Firefox and can also run on Firefox.
  • CasperJS, known to support JavaScript. Further encapsulation of the above two

I feel that your problem may not have to be so important.

The page content you want to capture, you know it comes from js, so where does this js come from? It may be the page or ajax json.

Find out the js that contains the content you need, and then use a json parser if it is json. If it is js, you can simply extract it using regular expressions.

PhantomJs maybe the best solution for you, also, casperJs is based on phantomJs that can be a useful tool to grab webpage content created by javascript or ajax

Zookeeper node. js

From your description, it sounds like you want to capture the page, but the content in the page is produced by JS. You can capture an empty shell by capturing the page. Right?

In this case, we recommend that you use "headless Browser", which is the first example of PhantomJS. It is essentially a browser, but there is no user interface. It is called through programming, finally, you can interact with your external code, return HTML to you (the final one), and give it to you.

Use nodejs directly, and then execute the returned content.

In this case, I usually take a look at the js Code by myself, find the desired place, and then implement it by myself. in java, there seems to be a library that can execute js Code, for example, when I perform a simulated login on Sina Weibo, I directly extract the encryption function from the website js, execute the code to obtain the result, and finally simulate the request.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.