Java-based data collection (1)
Okay, let's take a look at the above two articles about the collection principle. The rest goes directly to the Code: GerData. java (encapsulation of the data collection method) is actually a simple match regular: group (): returns the input subsequence captured by a given group during the previous matching operation. Find (): Try to find the next subsequence of the input sequence that matches the pattern. 1 package com. lcw. curl; copy Code 2 3 import java. util. regex. matcher; 4 import java. util. regex. pattern; 5 6 public class GetData {7 8/** 9*10 * @ param regex regular expression 11 * @ param content 12 * @ return13 */14 public String getData (string regex, string content) {15 Pattern pattern = Pattern. compile (regex, Pattern. CASE_INSENSITIVE); // set the regular expression, Case Insensitive 16 Matcher = pattern. matcher (content); 17 if (matcher. Find () {18 return matcher. group (); 19} else {20 return ""; 21} 22} 23 24} copy the code CurlMain. java (main program) InputStreamReader () is a bridge between byte stream and forward stream. InputStreamReader () is a bridge between byte streams. OpenStream () opens the connection to this URL and returns a byte stream used to read from this connection. Copy code 1 package com. lcw. curl; 2 3 import java. io. bufferedReader; 4 import java. io. inputStreamReader; 5 import java.net. URL; 6 7 8 public class CurlMain {9 10/** 11 * @ param args12 */13 public static void main (String [] args) {14 try {15 String address = "http://www.footballresults.org/league.php? League = EngDiv1 "; 16 URL url = new URL (address); 17 InputStreamReader inputStreamReader = new InputStreamReader (url. openStream (), "UTF-8"); // open the address, return bytes in UTF-8 encoding and convert them into characters 18 BufferedReader bufferedReader = new BufferedReader (inputStreamReader ); // read text from the character input stream and buffer each character to provide efficient reading of characters, arrays, and rows. 19 20 GetData data = new GetData (); 21 String content = ""; // It is used to accept the 22 int flag = 0 for each row read; // flag, the team information is exactly behind the date information, and the regular expression is the same, used to separate data 23 String dateRegex = "\ d {1, 2 }\\. \ d {1, 2 }\\. \ d {4} "; // Regular Expression 24 String teamRegex ="> [^ <>] * </a> "; // Regular Expression 25 String scoreRegex = "> (\ d {1, 2}-\ d {1, 2}) </TD> "; // score regular expression 26 int I = 0; // number of records 27 28 while (content = bufferedReader. readLine ())! = Null) {// read a row of data each time 29 // get the competition date information 30 String dateInfo = data. getData (dateRegex, content); 31 if (! DateInfo. equals ("") {32 System. out. println ("Date:" + dateInfo); 33 flag ++; 34} 35 // obtain team information. Read the date information to increase the flag length by 36 String teamInfo = data. getData (teamRegex, content); 37 if (! TeamInfo. equals ("") & flag = 1) {38 teamInfo = teamInfo. substring (1, teamInfo. indexOf ("</a>"); 39 System. out. println ("lead:" + teamInfo); 40 flag ++; 41} else if (! TeamInfo. equals ("") & flag = 2) {42 teamInfo = teamInfo. substring (1, teamInfo. indexOf ("</a>"); 43 System. out. println ("Guest:" + teamInfo); 44 flag = 0; 45} 46 // obtain score information 47 String scoreInfo = data. getData (scoreRegex, content); 48 if (! ScoreInfo. equals ("") {49 scoreInfo = scoreInfo. substring (1, scoreInfo. indexOf ("</TD>"); 50 System. out. println ("score:" + scoreInfo); 51 System. out. println (); 52 I ++; 53} 54 55} 56 bufferedReader. close (); 57 System. out. println ("A total of" + I + "messages"); 58} catch (Exception e) {59 e. printStackTrace (); 60} 61 62} 63 64 65}