HttpClient + Jsoup Capture web page information (Netease precious metal as an example), httpclientjsoup

Source: Internet
Author: User

HttpClient + Jsoup Capture web page information (Netease precious metal as an example), httpclientjsoup

Let's talk about what we are going to do today.

Use HttpClient and Jsoup to capture webpage information. HttpClient is a client programming toolkit that supports HTTP protocol, and it supports HTTP protocol.

Jsoup is a webpage html Parser Based on Java platform. It can directly parse a URL address and HTML text content and provides a set of very convenient API interfaces, operate data using a method similar to jQuery.

 

HttpClient related documents: http://hc.apache.org/httpcomponents-client-5.0.x/index.html

Jsoup documents: http://jsoup.org/

Here, Netease precious metal information is used as an example to teach Case Study O (∩ _ ∩) O

Then, we first need to analyze the structure of the webpage source code.

Then we can start programming. First, we need to know the process of using httpClient:

1. Create an HttpClient object;

2. Create an instance of the Request Method and specify the access URL;

3. Call the HttpClient object to send a request. This method returns an HttpResponse. Determine whether the response code of responce. getStatusLine (). getStatusCode () is 200;

4. Call the HttpResponse method to obtain the corresponding content;

5. Release the connection.

Of course, when creating a project to import relevant jar package, this article will provide the source code + jar package http://pan.baidu.com/s/1sl55d85

StockUtils. java

1 package cn. clay. httpclient. utils; 2 3 import java. io. IOException; 4 5 import org. apache. http. httpEntity; 6 import org. apache. http. httpResponse; 7 import org. apache. http. httpStatus; 8 import org. apache. http. client. httpClient; 9 import org. apache. http. client. methods. httpGet; 10 import org. apache. http. impl. client. closeableHttpClient; 11 12 import org. apache. http. impl. client. httpClients; 13 import org. a Pache. http. util. entityUtils; 14/** 15 * pass webpage link 16 * return webpage source code 17 * @ author ClayZhang18 * 19 */20 public class StockUtils {21 // obtain webpage source code for the first time 22 public static String getHtmlByUrl (String url) throws IOException {23 String html = null; 24 CloseableHttpClient httpClient = HttpClients. createDefault (); // create httpClient object 25 HttpGet httpget = new HttpGet (url); 26 try {27 HttpResponse responce = httpClient.exe cute (httpg Et); 28 int resStatu = responce. getStatusLine (). getStatusCode (); 29 if (resStatu = HttpStatus. SC _ OK) {30 31 HttpEntity entity = responce. getEntity (); 32 if (entity! = Null) {33 html = EntityUtils. toString (entity); // obtain the html source code 34} 35} 36} catch (Exception e) {37 System. out. println ("access [" + url + "] exception! "); 38 e. printStackTrace (); 39} finally {40 // release connection 41 httpClient. close (); 42} 43 return html; 44} 45}

Then use the jsoup method to compile the test class StockTest. java

1 package cn. clay. httpclient. utils. test; 2 3 import java. io. IOException; 4 5 import org. apache. http. parseException; 6 import org. jsoup. jsoup; 7 import org. jsoup. nodes. document; 8 import org. jsoup. nodes. element; 9 import org. jsoup. select. elements; 10 11 import cn. clay. httpclient. utils. stockUtils; 12 13/** 14*15 * @ author ClayZhang16 * 17 */18 public class StockTest {19 20 public static void main (String [] args) throws ParseException, IOException {21 String content = StockUtils. getHtmlByUrl (22 "http://fa.163.com/zx/gjs/1/"); 23 parserHtml (content); 24} 25 26 27 public static void parserHtml (String content) throws ParseException, IOException {28 Document doc = Jsoup. parse (content); 29 Elements links = doc. getElementsByClass ("g-news "). select ("dl"); 30 for (Element e: links) {31 System. out. println ("news title:" + e. select (""). text (). toString (); 32 // get the Page Link 33 Elements linkHref = e. select ("a"); 34 // capture time string 35 Elements timeStr = e. select ("span [class = f-fr]"); 36 // brief information 37 Elements comment = e. select ("span [class = f-fl f-ofe u-digest]"); 38 System. out. println ("News link:" + linkHref. attr ("href"); 39 System. out. println ("Release Date:" + timeStr. text (); 40 System. out. println ("Brief Information:" + comment. text (). toString (); 41 42 System. out. println ("========================================== =================================== "); 43} 44 45} 46}

The effect after running is as follows:

This article is copyrighted by the author and the blog. For more information, see the source of the author and the original article.

Http://www.cnblogs.com/clayzhang

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.