Detailed introduction already has many predecessors summary, quoted this article: 78644484
The following is an example of a code:
Packagecom.http.client;Importjava.io.IOException;ImportOrg.apache.http.HttpHost;ImportOrg.apache.http.HttpResponse;Importorg.apache.http.client.ClientProtocolException;Importorg.apache.http.client.HttpClient;ImportOrg.apache.http.client.methods.HttpGet;ImportOrg.apache.http.conn.params.ConnRouteParams;Importorg.apache.http.impl.client.DefaultHttpClient;ImportOrg.apache.http.params.CoreConnectionPNames;Importorg.apache.http.util.EntityUtils;ImportOrg.apache.log4j.Logger;/** * * @authoroo * @date 2018-04-04*/ Public classmyhttpclient {Private StaticLogger Logger = Logger.getlogger (myhttpclient.class); /*** Requirements: Crawl site data Using httpclient * *@paramargs*/ Public Static voidMain (string[] args) {//Create a HttpClient objectHttpClient hclient =Newdefaulthttpclient (); //Set the response time Transfer source time proxy Server (the purpose of setting up a proxy server is to prevent crawling data from being blocked by IP)Hclient.getparams (). Setparameter (Coreconnectionpnames.connection_timeout, 20000). Setparameter (Coreconnectionpnames.so_timeout,20000). Setparameter (Connrouteparams.default_proxy,NewHttphost ("111.155.116.237", 8123)); HttpGet Hget=NewHttpGet ("http://www.itcast.cn/"); String content= ""; Try { //send a request to the website to get the Web sourceHttpResponse Execute =Hclient.execute (Hget); //Entityutils Tool Class converts web page entities to stringsContent = entityutils.tostring (execute.getentity (), "Utf-8"); } Catch(clientprotocolexception e) {e.printstacktrace (); Logger.error ("********clientprotocolexception" +e); } Catch(IOException e) {e.printstacktrace (); Logger.error ("********ioexception" +e); } System.out.println (content); }}
To make a request using Jsoup:
Packagecom.http.client;Importjava.io.IOException;ImportOrg.apache.log4j.Logger;ImportOrg.jsoup.Jsoup;Importorg.jsoup.nodes.Document;Importorg.jsoup.nodes.Element;Importorg.jsoup.select.Elements; Public classMyjsoup {Private StaticLogger Logger = Logger.getlogger (myjsoup.class); Public Static voidMain (string[] args) {Try { //sending requests using JsoupDocument document = Jsoup.connect ("http://www.itcast.cn"). get ();//System.out.println (document);Elements Elements = Document.getelementsbytag ("a"); String Val=Elements.text (); System.out.println (Val); for(Element element:elements) {System.out.println (Element.text ()+ ":" +element.attr ("href"))); } } Catch(IOException e) {e.printstacktrace (); Logger.error ("***********ioexception: Connection Failed" +e); } }}
HttpClient Combined Jsoup:
1 Packagecom.http.client;2 3 Importjava.io.IOException;4 5 ImportOrg.apache.http.HttpResponse;6 Importorg.apache.http.client.ClientProtocolException;7 Importorg.apache.http.client.HttpClient;8 ImportOrg.apache.http.client.methods.HttpGet;9 Importorg.apache.http.impl.client.DefaultHttpClient;Ten Importorg.apache.http.util.EntityUtils; One ImportOrg.jsoup.Jsoup; A Importorg.jsoup.nodes.Document; - Importorg.jsoup.nodes.Element; - Importorg.jsoup.select.Elements; the - Public classHttpclientandjsoup { - - Public Static voidMain (string[] args)throwsclientprotocolexception, IOException { + //Create a HttpClient object -HttpClient hclient =Newdefaulthttpclient (); + //Most crawler URLs are get requests that create a GET Request object AHttpGet Hget =NewHttpGet ("http://www.itcast.cn/"); at //send a request to the website to get the Web source -HttpResponse response =Hclient.execute (hget); - //Entityutils Tool Class converts web page entities to strings -String content = entityutils.tostring (response.getentity (), "Utf-8"); - //Jsoup is responsible for parsing Web pages -Document doc =jsoup.parse (content); in //Select Web content using the element selector -Elements Elements = Doc.select ("Div.salary_con li"); to //System.out.println (Elements.text ()); + for(Element element:elements) { -String Text =Element.text (); the System.out.println (text); * } $ Panax Notoginseng } - the}
Simple application of Httpclient&jsoup crawler