Use selenium to crawl JS dynamically generated pages

Source: Internet
Author: User
Tags gettext seleniumhq

When crawling Web page data, the traditional Jsoup scheme can only be valid for static pages, while some Web pages are often generated by JS, so other scenarios are needed. The first idea is to analyze the JS program, the JS request to crawl again, which is suitable for a specific page crawl, to achieve the universality of the different target URLs, more trouble. The second way of thinking, it is also more mature practice is to use third-party drive rendering page, and then download. Here's a second way to implement this idea.

Selenium is an automated test tool that simulates a browser that provides a set of APIs that can interact with the real browser kernel.

The MAVEN configuration in the Java environment is as follows:

<dependency>   <groupId>org.seleniumhq.selenium</groupId>   <artifactId> selenium-java</artifactid>   <version>2.46.0</version>  </dependency>

Third-party drivers are mainly iedriver,firefoxdriver,chromedriver,htmlunitdriver. Htmlunit is also atools for automated testing. can be usedthe Htmlunit simulates the browser run and obtains the executed HTML page. Where Htmlunitdriver is the encapsulation of htmlunit. because Htmlunit has limited support for JS parsing, it is not commonly used in practical projects. take Chrome as an exampleDownload the corresponding driveMoving: When downloading driver, you need to pay attention toSelenium version compatible, there may be abnormal situation, generally download the latest version is good. Make sure you have a driving position before you run the program, such as under Windows
System.getproperties (). SetProperty ("",        "D:\\chromedriver\\chromedriver.exe");

Get the entire page

public static void Testchromedriver () {system.getproperties (). SetProperty ("",        "d:\\ Chromedriver\\chromedriver.exe "); Webdriver Webdriver = new Chromedriver (); Webdriver.get ("Http://" ); String responsebody = Webdriver.getpagesource (); System.out.println (responsebody); Webdriver.close ();}
Get Sina Comment number

public static void Waitforsomthing () {system.getproperties (). SetProperty ("",        "d:\\ Chromedriver\\chromedriver.exe "); Webdriver Driver = new Chromedriver ();d river.get ("");        webdriverwait wait = new webdriverwait (driver,10);          Wait.until (New expectedcondition<boolean> () {public            Boolean apply (Webdriver webdriver) {                SYSTEM.OUT.PRINTLN ("Searching ...");                Return Webdriver.findelement ( ("CommentCount1")). GetText (). Length ()! = 0;            }        });                  webelement element = Driver.findelement ( ("CommentCount1"));          System.out.println ("element=" +element.gettext ());}

More about Selenium API and introduction: 1,473,341 test reports for driver:

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Use selenium to crawl JS dynamically generated pages

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.