Previous address: http://blog.csdn.net/zxciop110/article/details/8544649
Overview
In the last issue, we learned how to collect data from HTML pages. To facilitate future calls of the collected data, we need to learn how to store the collected data (MySQL database ).
Data collection page
Premier League team achievements in
How to operate MySql in Java
Before using Java to operate MySQL databases, we need to import a jar package (mysql-connector-java-5.1.18-bin) in the project file)
You can download it from the MySQL official website.
Connector/J 5.1.18
The first time I used MySQL? See
Connect to MySQL using Java
How to import a jar package in a Java project?
Please refer to this
How to import jar packages in eclipse
About MySQL database
If you are a beginner who wants to use the MySQL database, go here.
Download the XAMPP package from the Chinese official website of XAMPP.
XAMPP (Apache + MySQL + PHP + Perl) is a powerful software integration package for XAMPP software stations. It is easy to use without modifying the configuration file.
Well, we have completed all the preparations and started to write the code.
Open MySQL database and create database and table (copy the following code to MySQL and execute it directly ).
Create a MySQL database # create a database htmldatacollectioncreate database htmldatacollection; # use the database htmldatacollectionuse htmldatacollection before creating a table; # create a table premiership in the database to store the data we collected # here to make it easy for all fields to be in the string format create table Premiership (date varchar (15 ), hometeam varchar (20), awayteam varchar (20), result varchar (20 ))
After creation, let's look at the database structure.
Main program code
After the database is ready, we start to implement the Java code. Here we will briefly introduce the methods contained in various classes and classes.
The datastorage class and the included datastore () method are used for data collection and storage.
Datastorage class import Java. io. bufferedreader; import Java. io. ioexception; import Java. io. inputstreamreader; import java.net. URL;/*** datastorage class for data collection and storage * @ author soflash-blog Park http://www.cnblogs.com/longwu */public class datastorage {public void datastore () {// first use a string to load the web link string strurl = "http://www.footballresults.org/league.php? All = 1 & League = engprem "; string sqlleagues =" "; try {// create a URL object to point it to the link brackets of the website () the path to the link to the website is loaded // For more information, see the http://wenku.baidu.com/view/8186caf4f61fb7360b4c6547.html url = new URL (strurl ); // inputstreamreader is an input stream reader used to convert read bytes into characters // more can look at the http://blog.sina.com.cn/s/blog_44a05959010004il.html inputstreamreader ISR = new inputstreamreader (URL. openstream (), "UTF-8"); // use the UTF-8 encoding mode in a unified manner // use buffere The character bufferedreader BR = new bufferedreader (ISR); string strread = ""; // a new string to load the content read by bufferedreader // define three regular expressions to obtain the required data string regulardate = "(\ D {1, 2 }\\. \ D {1, 2 }\\. \ D {4}) "; string regulartwoteam ="> [^ <>] * </a> "; string regularresult = "> (\ D {1, 2}-\ D {1, 2}) </TD> "; // create the object gmethod of the groupmethod class so that you can call the regulargroup method groupmethod gmethod = new groupmetho in the class later. D (); // create a Datastructure Data Structure Class Object for data storage Datastructure DS = new Datastructure (); // create MySQL Class Object for executing MySQL statement MySQL MS = new MySQL (); int I = 0; // define an I to record the number of rounds, that is, the number of game results collected by the team. Int Index = 0; // define an index to obtain the data of two teams separated because the regular expressions of the two teams are the same // start to read data. If the read data is not empty, read the while (( strread = BR. readline ())! = NULL) {/*** used to capture date data */string strget = gmethod. regulargroup (regulardate, strread); // print out if (! Strget. equals ("") {// system. out. println ("Date:" + strget); // Save the collected date to Ds in the data structure. date = strget; // here the index + 1 is used to obtain the team data in the later stage + + index; // In the source code of the HTML page, the team data is exactly after the date}/*** used to obtain the data of two teams */strget = gmethod. regulargroup (regulartwoteam, strread); If (! Strget. equals ("") & Index = 1) {// If the index is 1, the main data is separated by the subtring method. strget = strget. substring (1, strget. indexof ("</a>"); // system. out. println ("hometeam:" + strget); // print the name of the collected team to Ds in the data structure. hometeam = strget; index ++; // After the index is + 1, it is 2. // use the subtring method to separate the target team} else if (! Strget. equals ("") & Index = 2) {// here, the index of 2 is the passenger data strget = strget. substring (1, strget. indexof ("</a>"); // system. out. println ("awayteam:" + strget); // print the name of the collected team to the data structure Ds. awayteam = strget; Index = 0; // restore the index to the name of the main team used to collect the next data after collecting the name of the guest team}/*** used to obtain the competition result */strget = gmethod. regulargroup (regularresult, strread); If (! Strget. equals ("") {// here the substring method is also used to remove the '<' and "</TD>" tags to obtain the expected match result strget = strget. substring (1, strget. indexof ("</TD>"); // system. out. println ("Result:" + strget); DS. result = strget; // Save the collected results to the data structure. // system. out. println (); // MySQL insert statement sqlleagues = "insert into Premiership values (\" "+ Ds. date + "\", "+" \ "" + Ds. hometeam + "\", "+" \ "" + Ds. awayteam + "\", "+" \ "" + Ds. result + "\") "; // call the datatomysql () method of the MySQL class to execute the MySQL insert statement Ms. datatomysql (sqlleagues); I ++; // each inserted record I + 1; system. out. println ("+" + I + "") ;}// after reading the data, remember to disable bufferreader BR. close (); // system. out. println ("A total of collected" + I + "game records"); // print the number of cycles // when the data storage is complete, print the number of records collected by the team system. out. println ("data storage is complete, a total of inserted databases" + I + "records");} catch (ioexception e) {// if an error occurs, an exception is thrown. E. printstacktrace ();}}}
The Datastructure simple data structure contains the corresponding fields for temporary storage of collected data.
Datastructure class/*** Datastructure Class A simple data structure * @ author soflash-blog Park http://www.cnblogs.com/longwu */public class Datastructure {// define the data field Public String hometeam; Public String awayteam; public String date; Public String result ;}
The groupmethod class and the contained regulargroup () method are used for regular matching of data on the HTML source code.
Groupmethod class import Java. util. regEx. matcher; import Java. util. regEx. pattern; /*** groupmethod class is used to match and capture HTML page data * @ author soflash-blog garden http://www.cnblogs.com/longwu */public class groupmethod {// input 2 string parameters one is pattern (we regular Expression used) another matcher is the HTML source code Public String regulargroup (string pattern, string matcher) {pattern P = pattern. compile (pattern, pattern. case_insensitive); matcher M = P. matcher (matcher); If (M. find () {// if you read return M. group (); // return the captured data} else {return ""; // otherwise, a null value is returned }}}
The MySQL class and the datatomysql () method are used to execute the SQL insert statement to insert data temporarily stored in the data structure into the MySQL database.
MySQL class import Java. SQL. connection; import Java. SQL. drivermanager; import Java. SQL. sqlexception; import Java. SQL. statement;/*** MySQL class for implementing MySQL Database Operations * @ author soflash-blog Park http://www.cnblogs.com/longwu */public class MySQL {// define MySQL driver, database address, database username and password, execute the statement and connect to the database Public String driver = "com. mySQL. JDBC. driver "; Public String url =" JDBC: mysql: // 127.0.0.1: 3306/htmldatacollection "; Public String user =" root "; Public String Password =" root "; public Statement stmt = NULL; public connection conn = NULL; // create a data insertion method public void datatomysql (string insertsql) {try {class. forname (driver ). newinstance ();} catch (exception e) {system. out. println ("unable to find the local driver"); E. printstacktrace ();} // create a connection conn = drivermanager. getconnection (URL, user, password); // create a statement object to send the SQL statement to the database stmt = Conn. createstatement ();} catch (sqlexception e) {e. printstacktrace ();} Try {// execute the SQL insert statement stmt.exe cuteupdate (insertsql);} catch (sqlexception e) {e. printstacktrace ();} Try {// stop execution statement stmt after execution. close (); // close the database connection conn after execution. close ();} catch (sqlexception e) {e. printstacktrace ();}}}
Main function for data output
Main main function/*** main function for data output * @ author soflash-blog Park http://www.cnblogs.com/longwu */public class main {public static void main (string [] ARGs) {// call the datastore () method in the datastorage class in the main function, datastorage DS = new datastorage (); DS. datastore ();}}
Run to view
Now let's execute the following command to see the result.
Data collection page
Premier League team achievements in
HTML page-Initial Stage
MySQL database-Initial Stage
HTML page-end stage
MySQL database-end stage
A total of 189 records are collected
MySQL database displays 189 rows of data
In this way, we have collected all the game records of the Premier League in 2011-2012 and stored them in the MySQL database .:)
Source Code address