Java-based data collection (1)

Source: Internet
Author: User

Java-based data collection (1)
Okay, let's take a look at the above two articles about the collection principle. The rest goes directly to the Code: GerData. java (encapsulation of the data collection method) is actually a simple match regular: group (): returns the input subsequence captured by a given group during the previous matching operation. Find (): Try to find the next subsequence of the input sequence that matches the pattern. 1 package com. lcw. curl; copy Code 2 3 import java. util. regex. matcher; 4 import java. util. regex. pattern; 5 6 public class GetData {7 8/** 9*10 * @ param regex regular expression 11 * @ param content 12 * @ return13 */14 public String getData (string regex, string content) {15 Pattern pattern = Pattern. compile (regex, Pattern. CASE_INSENSITIVE); // set the regular expression, Case Insensitive 16 Matcher = pattern. matcher (content); 17 if (matcher. Find () {18 return matcher. group (); 19} else {20 return ""; 21} 22} 23 24} copy the code CurlMain. java (main program) InputStreamReader () is a bridge between byte stream and forward stream. InputStreamReader () is a bridge between byte streams. OpenStream () opens the connection to this URL and returns a byte stream used to read from this connection. Copy code 1 package com. lcw. curl; 2 3 import java. io. bufferedReader; 4 import java. io. inputStreamReader; 5 import java.net. URL; 6 7 8 public class CurlMain {9 10/** 11 * @ param args12 */13 public static void main (String [] args) {14 try {15 String address = "http://www.footballresults.org/league.php? League = EngDiv1 "; 16 URL url = new URL (address); 17 InputStreamReader inputStreamReader = new InputStreamReader (url. openStream (), "UTF-8"); // open the address, return bytes in UTF-8 encoding and convert them into characters 18 BufferedReader bufferedReader = new BufferedReader (inputStreamReader ); // read text from the character input stream and buffer each character to provide efficient reading of characters, arrays, and rows. 19 20 GetData data = new GetData (); 21 String content = ""; // It is used to accept the 22 int flag = 0 for each row read; // flag, the team information is exactly behind the date information, and the regular expression is the same, used to separate data 23 String dateRegex = "\ d {1, 2 }\\. \ d {1, 2 }\\. \ d {4} "; // Regular Expression 24 String teamRegex ="> [^ <>] * </a> "; // Regular Expression 25 String scoreRegex = "> (\ d {1, 2}-\ d {1, 2}) </TD> "; // score regular expression 26 int I = 0; // number of records 27 28 while (content = bufferedReader. readLine ())! = Null) {// read a row of data each time 29 // get the competition date information 30 String dateInfo = data. getData (dateRegex, content); 31 if (! DateInfo. equals ("") {32 System. out. println ("Date:" + dateInfo); 33 flag ++; 34} 35 // obtain team information. Read the date information to increase the flag length by 36 String teamInfo = data. getData (teamRegex, content); 37 if (! TeamInfo. equals ("") & flag = 1) {38 teamInfo = teamInfo. substring (1, teamInfo. indexOf ("</a>"); 39 System. out. println ("lead:" + teamInfo); 40 flag ++; 41} else if (! TeamInfo. equals ("") & flag = 2) {42 teamInfo = teamInfo. substring (1, teamInfo. indexOf ("</a>"); 43 System. out. println ("Guest:" + teamInfo); 44 flag = 0; 45} 46 // obtain score information 47 String scoreInfo = data. getData (scoreRegex, content); 48 if (! ScoreInfo. equals ("") {49 scoreInfo = scoreInfo. substring (1, scoreInfo. indexOf ("</TD>"); 50 System. out. println ("score:" + scoreInfo); 51 System. out. println (); 52 I ++; 53} 54 55} 56 bufferedReader. close (); 57 System. out. println ("A total of" + I + "messages"); 58} catch (Exception e) {59 e. printStackTrace (); 60} 61 62} 63 64 65}

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.