Please look at this page, I want to find the content of an announcement, must be an open link, a lot of it.
So, I use selenium to open every link, and then write the content of the announcement txt
That requires a bit of a step.
1. Open an announcement in turn
2. Switch focus to a new window, find the announcement, write to TXT
3. Close the window
4. Switch to the main window
5. After the current page has been traversed, click on the next page
6. Repeat step 1
Because the next page is a good flag, it can be used as a looping condition because the last page does not have the element on the next page
The next step is to find the relevant XPath
Number of lists: Count (//tr/td/a[starts-with (@href, ' article_show.asp?id= ') and @title! = "])
List://tr/td/a[starts-with (@href, ' article_show.asp?id= ') and @title! = "]
Next://div/a[text () = ' next page ']
Selenium webdriver test page, click the link target= "_blank", open a new page, switch to the new window
This is to use
String Currentwindow = Driver.getwindowhandle ();//Gets the current window handle
Set<string> handles = Driver.getwindowhandles ();//Get all the window sentences
Webdriver window = Driver.switchto (). Window (It.next ());//Switch to new windows
Driver.switchto (). window (Currentwindow);//Return to the original page
Driver=driver.switchto (). Window (Driver.getwindowhandle ()); Turn the next page into the current driver
Currentwindow =Driver.getwindowhandle (); //Get all WindowsSet<string> handles=Driver.getwindowhandles (); for(String s:handles) {//Current page is don ' t close if(S.equals (Currentwindow))Continue; Else{window=driver.switchto (). window (s); } window. Close (); }driver.switchto (). window (Currentwindow);
View Code
Specific code
PackageCom.packt.webdriver.chapter3;ImportJava.io.BufferedWriter;ImportJava.io.FileWriter;Importjava.io.IOException;Importjava.util.List;ImportJava.util.Set;ImportJava.util.concurrent.TimeUnit;Importorg.openqa.selenium.By;ImportOrg.openqa.selenium.WebDriver;Importorg.openqa.selenium.WebElement; Public classTraversalalllinks {Private StaticString Currentwindow; Public Static voidMain (string[] args) {Webdriver driver=Driverfactory.getfirefoxdriver (); Driver.get ("Http://www.lhgtj.gov.cn/article.asp?ClassID=86&page=1"); Driver.manage (). window (). Maximize (); Driver.manage (). Timeouts (). Implicitlywait (60, Timeunit.seconds); Driver.manage (). Timeouts (). Pageloadtimeout (60, Timeunit.seconds); Webelement NextPage=driver.findelement (By.xpath ("//tr/td/a[@title = ' next page ']")); while(nextpage.isdisplayed ()) {List<WebElement> links=driver.findelements (By.xpath ("//tr/td/a[starts-with (@href, ' article_show.asp?id= ') and @ Title!= "]")); for(webelement link:links) {webdriver window; System.out.println (Link.gettext ()); Try{writetotxt (Link.gettext ()); } Catch(IOException E1) {//TODO auto-generated Catch blockE1.printstacktrace (); } link.click (); Currentwindow=Driver.getwindowhandle (); //Get all WindowsSet<string> handles=Driver.getwindowhandles (); for(String s:handles) {//Current page is don ' t close if(S.equals (Currentwindow))Continue; Else{window=driver.switchto (). window (s); Window.manage (). window (). Maximize (); Window.manage (). Timeouts (). Implicitlywait (60, Timeunit.seconds); Window.manage (). Timeouts (). Pageloadtimeout (60, Timeunit.seconds); //Get all TablesList<webelement> tbs=window.findelements (By.xpath ("//tbody/tr/td/p")); for(webelement Tb:tbs) {System.out.println (Tb.gettext ()); Try{writetotxt (Tb.gettext ()+ "\ n"); } Catch(IOException e) {//TODO auto-generated Catch blockE.printstacktrace (); } } //Close the Table windowwindow. Close (); } //swich to current windowdriver.switchto (). window (Currentwindow); } } //Click Next PageNextpage.click (); //Set next page to current pageDriver=driver.switchto (). Window (Driver.getwindowhandle ()); Driver.manage (). window (). Maximize (); Driver.manage (). Timeouts (). Implicitlywait (60, Timeunit.seconds); Driver.manage (). Timeouts (). Pageloadtimeout (60, Timeunit.seconds); NextPage=driver.findelement (By.xpath ("//tr/td/a[@title = ' next page ']")); } } //Write Logs Public Static voidWritetotxt (String message)throwsIOException {bufferedwriter bf=NULL; Try { //set true, avoidBF =NewBufferedWriter (NewFileWriter ("Report.txt",true)); Bf.write (message); Bf.flush (); } Catch(IOException e) {//TODO auto-generated Catch blockE.printstacktrace (); } finally{bf.close (); } }}
Driverfactory
Public StaticWebdriver Getfirefoxdriver () {Try{windowsutils.trytokillbyname ("Firefox.exe"); } Catch(Exception e) {System.out.println ("Can not find Firefox process"); } File File=NewFile ("D:\\firebug-2.0.4-fx.xpi"); Firefoxprofile Profile=NewFirefoxprofile (); Try{profile.addextension (file); Profile.setpreference ("Extensions.firebug.currentVersion", "2.0.4"); Profile.setpreference ("Extensions.firebug.allPagesActivation", "on"); } Catch(IOException e3) {//TODO auto-generated Catch blockE3.printstacktrace (); } webdriver Driver=Newfirefoxdriver (profile); returndriver; }
View Code
Selenium FF Webdriver Traverse all links (alternative crawlers)