Java uses URLs to capture webpage content and url webpage content
I just learned to deploy git to a remote server, but nothing is needed. So I simply made a small tool to capture web page information. If some values in it are set to parameters, it may improve the scalability! I hope this is a good start, and I am more familiar with reading strings. It is worth noting that when we use String to splice strings in JAVA1.8, the String you want to splice is automatically processed using StringBulider, which greatly optimizes the String performance. Let's not talk about it. show my XXX code ~
Running Effect:
First open Baidu encyclopedia, search for word entries, such as "actors", and then press F12 to view the source code
Capture the desired tag and inject it into the LinkedHashMap. It's easy, right! Look at the code
1 import java. io. bufferedReader; 2 import java. io. IOException; 3 import java. io. inputStreamReader; 4 import java.net. httpURLConnection; 5 import java.net. URL; 6 import java. util. *; 7 8/** 9 * Created by chunmiao on 17-3-10. 10 */11 public class ReadBaiduSearch {12 13 // Save the returned result 14 private LinkedHashMap <String, String> mapOfBaike; 15 16 17 // obtain the search information 18 public LinkedHashMap <String, string> getInfomatio NOfBaike (String infomationWords) throws IOException {19 mapOfBaike = getResult (infomationWords); 20 return mapOfBaike; 21} 22 23 // obtain information through the network link 24 private static LinkedHashMap <String, string> getResult (String keywords) throws IOException {25 // search url 26 String keyUrl = "http://baike.baidu.com/search? Word = "+ keywords; 27 // node 28 String startNode of the search term bar =" <dl class = \ "search-list \"> "; 29 // keyword 30 String keyOfHref = "href = \""; 31 // keyword 32 String keyOfTitle = "target = \" _ blank \ ">"; 33 34 String endNode = "</dl> "; 35 36 boolean isNode = false; 37 38 String title; 39 40 String href; 41 42 String rLine; 43 44 LinkedHashMap <String, String> keyMap = new LinkedHashMap <String, string> (); 45 46 // start Network request 47 URL url = new URL (keyUrl); 48 HttpURLConnection urlConnection = (HttpURLConnection) url. openConnection (); 49 InputStreamReader inputStreamReader = new InputStreamReader (urlConnection. getInputStream (), "UTF-8"); 50 BufferedReader bufferedReader = new BufferedReader (inputStreamReader); 51 52 // read webpage content 53 while (rLine = bufferedReader. readLine ())! = Null) {54 // determine whether 55 if (rLine. contains (startNode) {56 isNode = true; 57} 58 // if the target node appears, capture data 59 if (isNode) {60 // if the target end node appears, end reading, saving reading time 61 if (rLine. contains (endNode) {62 // close read stream 63 bufferedReader. close (); 64 inputStreamReader. close (); 65 break; 66} 67 // if the value is null, do not read 68 if (title = getName (rLine, keyOfTitle ))! = "") & (Href = getHref (rLine, keyOfHref ))! = "") {69 keyMap. put (title, href); 70} 71} 72} 73 return keyMap; 74} 75 76 // obtain the url of the entry 77 private static String getHref (String rLine, String keyOfHref) {78 String sort keurl = "http://baike.baidu.com"; 79 String result = ""; 80 if (rLine. contains (keyOfHref) {81 // get url 82 for (int j = rLine. indexOf (keyOfHref) + keyOfHref. length (); j <rLine. length () & (rLine. charAt (j )! = '\ "'); J ++) {83 result + = rLine. charAt (j); 84} 85 // The obtained url may not contain the javaskeurl. if not, add an 86 if (! Result. contains (inclukeurl) {87 result = inclukeurl + result; 88} 89} 90 return result; 91} 92 93 // get the name of the entry 94 private static String getName (String rLine, String keyOfTitle) {95 String result = ""; 96 // obtain the title content 97 if (rLine. contains (keyOfTitle) {98 result = rLine. substring (rLine. indexOf (keyOfTitle) + keyOfTitle. length (), rLine. length (); 99 // remove the label contained in the title. 100 result = result. replaceAll ("<em> | </a> | <a>", ""); 101} 102 return result; 103} 104 105}
View Code
It's so late now. Go to bed...