Java uses URLs to capture webpage content and url webpage content

Source: Internet
Author: User

Java uses URLs to capture webpage content and url webpage content

I just learned to deploy git to a remote server, but nothing is needed. So I simply made a small tool to capture web page information. If some values in it are set to parameters, it may improve the scalability! I hope this is a good start, and I am more familiar with reading strings. It is worth noting that when we use String to splice strings in JAVA1.8, the String you want to splice is automatically processed using StringBulider, which greatly optimizes the String performance. Let's not talk about it. show my XXX code ~

 Running Effect:

 

First open Baidu encyclopedia, search for word entries, such as "actors", and then press F12 to view the source code

 

Capture the desired tag and inject it into the LinkedHashMap. It's easy, right! Look at the code

1 import java. io. bufferedReader; 2 import java. io. IOException; 3 import java. io. inputStreamReader; 4 import java.net. httpURLConnection; 5 import java.net. URL; 6 import java. util. *; 7 8/** 9 * Created by chunmiao on 17-3-10. 10 */11 public class ReadBaiduSearch {12 13 // Save the returned result 14 private LinkedHashMap <String, String> mapOfBaike; 15 16 17 // obtain the search information 18 public LinkedHashMap <String, string> getInfomatio NOfBaike (String infomationWords) throws IOException {19 mapOfBaike = getResult (infomationWords); 20 return mapOfBaike; 21} 22 23 // obtain information through the network link 24 private static LinkedHashMap <String, string> getResult (String keywords) throws IOException {25 // search url 26 String keyUrl = "http://baike.baidu.com/search? Word = "+ keywords; 27 // node 28 String startNode of the search term bar =" <dl class = \ "search-list \"> "; 29 // keyword 30 String keyOfHref = "href = \""; 31 // keyword 32 String keyOfTitle = "target = \" _ blank \ ">"; 33 34 String endNode = "</dl> "; 35 36 boolean isNode = false; 37 38 String title; 39 40 String href; 41 42 String rLine; 43 44 LinkedHashMap <String, String> keyMap = new LinkedHashMap <String, string> (); 45 46 // start Network request 47 URL url = new URL (keyUrl); 48 HttpURLConnection urlConnection = (HttpURLConnection) url. openConnection (); 49 InputStreamReader inputStreamReader = new InputStreamReader (urlConnection. getInputStream (), "UTF-8"); 50 BufferedReader bufferedReader = new BufferedReader (inputStreamReader); 51 52 // read webpage content 53 while (rLine = bufferedReader. readLine ())! = Null) {54 // determine whether 55 if (rLine. contains (startNode) {56 isNode = true; 57} 58 // if the target node appears, capture data 59 if (isNode) {60 // if the target end node appears, end reading, saving reading time 61 if (rLine. contains (endNode) {62 // close read stream 63 bufferedReader. close (); 64 inputStreamReader. close (); 65 break; 66} 67 // if the value is null, do not read 68 if (title = getName (rLine, keyOfTitle ))! = "") & (Href = getHref (rLine, keyOfHref ))! = "") {69 keyMap. put (title, href); 70} 71} 72} 73 return keyMap; 74} 75 76 // obtain the url of the entry 77 private static String getHref (String rLine, String keyOfHref) {78 String sort keurl = "http://baike.baidu.com"; 79 String result = ""; 80 if (rLine. contains (keyOfHref) {81 // get url 82 for (int j = rLine. indexOf (keyOfHref) + keyOfHref. length (); j <rLine. length () & (rLine. charAt (j )! = '\ "'); J ++) {83 result + = rLine. charAt (j); 84} 85 // The obtained url may not contain the javaskeurl. if not, add an 86 if (! Result. contains (inclukeurl) {87 result = inclukeurl + result; 88} 89} 90 return result; 91} 92 93 // get the name of the entry 94 private static String getName (String rLine, String keyOfTitle) {95 String result = ""; 96 // obtain the title content 97 if (rLine. contains (keyOfTitle) {98 result = rLine. substring (rLine. indexOf (keyOfTitle) + keyOfTitle. length (), rLine. length (); 99 // remove the label contained in the title. 100 result = result. replaceAll ("<em> | </a> | <a>", ""); 101} 102 return result; 103} 104 105}
View Code

 

It's so late now. Go to bed...

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.