First we create a new MAVEN normal client project and then open the Pom.xml
Introduction of Htmlunit Support:
<Dependency> <groupId>Net.sourceforge.htmlunit</groupId> <Artifactid>Htmlunit</Artifactid> <version>2.26</version></Dependency>
Then we write a test class, to parse www.baidu.com get Web page HTML and Web page text, here is a bit similar to httpclient, but the bottom of the execution process by default more than a JS execution process (of course Htmlunit provides the turn off JS parsing settings);
Package Com.demo;import Com.gargoylesoftware.htmlunit.browserversion;import Java.io.ioexception;import Java.net.MalformedURLException; Import Com.gargoylesoftware.htmlunit.failinghttpstatuscodeexception;import Com.gargoylesoftware.htmlunit.webclient;import Com.gargoylesoftware.htmlunit.html.HtmlPage; public class Htmlunittest {public static void main (string[] args) {//impersonate the specified browser using proxy WebClient WebClient =new WebClient (browserversion.firefox_52, "202.106.16.36", 3128); Instantiate the Web Client webclient.getoptions (). setcssenabled (false); Cancel CSS Support Webclient.getoptions (). setjavascriptenabled (false); Cancel JavaScript support try {htmlpage page=webclient.getpage ("http://www.baidu.com");//Parse Get page Threa D.sleep (10000); Rest 10 seconds wait for Htmlunit to execute JS HtmlForm form=page.getformbyname ("MyForm"); Get search Form Htmltextinput textfield=form.getinputbyname ("Q"); Gets the query text box htmlsubmitinput button=form.getinputbyname ("Submitbutton"); Get Submit button Textfield.setvalueattribute ("Java"); text box "Fill in" Data htmlpage Page2=button.click (); Analog Click HtmlTable Table=page.gethtmlelementbyid ("table1"); For (HtmlTableRow row:table.getRows ()) {//Traverse all rows for (HtmlTableCell cell:row.getCells ()) {//Traverse all columns System.out.print (Cell.astext () + ""); } System.out.println (); } htmldivision Div=page.gethtmlelementbyid ("Navmenu"); Finds the HTML DOM element System.out.println (Div.asxml ()) of the specified ID; System.out.println ("======================"); Domnodelist<DomElement>Alist=page.getelementsbytagname ("a");//query all tag for (int i=0;i) according to tag name<Alist.getlength(); i++) {DomElement a=alist.get (i); System.out.println (A.asxml ()); } System.out.println ("======================"); Htmllistitem Item= (Htmllistitem)Page.getbyxpath ("//div[@id= ' Navmenu '][1]/ul/li "). Get (0); XPath mode System.out.println (Item.asxml ()); SYSTEM.OUT.PRINTLN ("Web HTML:" +page.asxml ()); Gets the HTML System.out.println ("===================="); System.out.println ("Web page text:" +page.astext ());//Get text} catch (Failinghttpstatuscodeexception e) {//T ODO auto-generated Catch block E.printstacktrace (); } catch (Malformedurlexception e) {//TODO auto-generated catch block E.printstacktrace (); } catch (IOException e) {//TODO auto-generated catch block E.printstacktrace (); }finally{webclient.close ();//Close client, free memory }}}
Here proxy IP How to find, many sites have provided, we introduce A, http://www.66ip.cn
Htmlunit Simple operation