Selenium FF Webdriver Traverse all links (alternative crawlers)

Source: Internet
Author: User
Tags xpath

Please look at this page, I want to find the content of an announcement, must be an open link, a lot of it.

So, I use selenium to open every link, and then write the content of the announcement txt

That requires a bit of a step.

1. Open an announcement in turn

2. Switch focus to a new window, find the announcement, write to TXT

3. Close the window

4. Switch to the main window

5. After the current page has been traversed, click on the next page

6. Repeat step 1

Because the next page is a good flag, it can be used as a looping condition because the last page does not have the element on the next page

The next step is to find the relevant XPath

Number of lists: Count (//tr/td/a[starts-with (@href, ' article_show.asp?id= ') and @title! = "])
List://tr/td/a[starts-with (@href, ' article_show.asp?id= ') and @title! = "]
Next://div/a[text () = ' next page ']

Selenium webdriver test page, click the link target= "_blank", open a new page, switch to the new window

This is to use

String Currentwindow = Driver.getwindowhandle ();//Gets the current window handle
Set<string> handles = Driver.getwindowhandles ();//Get all the window sentences

Webdriver window = Driver.switchto (). Window (It.next ());//Switch to new windows

Driver.switchto (). window (Currentwindow);//Return to the original page

Driver=driver.switchto (). Window (Driver.getwindowhandle ()); Turn the next page into the current driver

Currentwindow =Driver.getwindowhandle (); //Get all WindowsSet<string> handles=Driver.getwindowhandles ();  for(String s:handles) {//Current page is don ' t close                if(S.equals (Currentwindow))Continue; Else{window=driver.switchto (). window (s);                        } window. Close (); }driver.switchto (). window (Currentwindow);
View Code

Specific code

 PackageCom.packt.webdriver.chapter3;ImportJava.io.BufferedWriter;ImportJava.io.FileWriter;Importjava.io.IOException;Importjava.util.List;ImportJava.util.Set;ImportJava.util.concurrent.TimeUnit;Importorg.openqa.selenium.By;ImportOrg.openqa.selenium.WebDriver;Importorg.openqa.selenium.WebElement; Public classTraversalalllinks {Private StaticString Currentwindow;  Public Static voidMain (string[] args) {Webdriver driver=Driverfactory.getfirefoxdriver (); Driver.get ("Http://www.lhgtj.gov.cn/article.asp?ClassID=86&page=1");        Driver.manage (). window (). Maximize (); Driver.manage (). Timeouts (). Implicitlywait (60, Timeunit.seconds); Driver.manage (). Timeouts (). Pageloadtimeout (60, Timeunit.seconds); Webelement NextPage=driver.findelement (By.xpath ("//tr/td/a[@title = ' next page ']"));  while(nextpage.isdisplayed ()) {List<WebElement> links=driver.findelements (By.xpath ("//tr/td/a[starts-with (@href, ' article_show.asp?id= ') and @ Title!= "]"));  for(webelement link:links) {webdriver window;            System.out.println (Link.gettext ()); Try{writetotxt (Link.gettext ()); } Catch(IOException E1) {//TODO auto-generated Catch blockE1.printstacktrace ();            } link.click (); Currentwindow=Driver.getwindowhandle (); //Get all WindowsSet<string> handles=Driver.getwindowhandles ();  for(String s:handles) {//Current page is don ' t close                if(S.equals (Currentwindow))Continue; Else{window=driver.switchto (). window (s);                    Window.manage (). window (). Maximize (); Window.manage (). Timeouts (). Implicitlywait (60, Timeunit.seconds); Window.manage (). Timeouts (). Pageloadtimeout (60, Timeunit.seconds); //Get all TablesList<webelement> tbs=window.findelements (By.xpath ("//tbody/tr/td/p"));  for(webelement Tb:tbs) {System.out.println (Tb.gettext ()); Try{writetotxt (Tb.gettext ()+ "\ n"); } Catch(IOException e) {//TODO auto-generated Catch blockE.printstacktrace (); }                                         }                    //Close the Table windowwindow. Close (); }            //swich to current windowdriver.switchto (). window (Currentwindow); }                        }        //Click Next PageNextpage.click (); //Set next page to current pageDriver=driver.switchto (). Window (Driver.getwindowhandle ());        Driver.manage (). window (). Maximize (); Driver.manage (). Timeouts (). Implicitlywait (60, Timeunit.seconds); Driver.manage (). Timeouts (). Pageloadtimeout (60, Timeunit.seconds); NextPage=driver.findelement (By.xpath ("//tr/td/a[@title = ' next page ']")); }            }    //Write Logs     Public Static voidWritetotxt (String message)throwsIOException {bufferedwriter bf=NULL; Try {            //set true, avoidBF =NewBufferedWriter (NewFileWriter ("Report.txt",true));            Bf.write (message);                   Bf.flush (); } Catch(IOException e) {//TODO auto-generated Catch blockE.printstacktrace (); }        finally{bf.close (); }        }}

Driverfactory

 Public StaticWebdriver Getfirefoxdriver () {Try{windowsutils.trytokillbyname ("Firefox.exe"); }        Catch(Exception e) {System.out.println ("Can not find Firefox process"); } File File=NewFile ("D:\\firebug-2.0.4-fx.xpi"); Firefoxprofile Profile=NewFirefoxprofile (); Try{profile.addextension (file); Profile.setpreference ("Extensions.firebug.currentVersion", "2.0.4"); Profile.setpreference ("Extensions.firebug.allPagesActivation", "on"); } Catch(IOException e3) {//TODO auto-generated Catch blockE3.printstacktrace (); } webdriver Driver=Newfirefoxdriver (profile); returndriver; }
View Code

Selenium FF Webdriver Traverse all links (alternative crawlers)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.