Simple java Collection Program, java Collection
[Target task] Collect mobile phone number segments nationwide to the database table through this website
[Completion process]
1. Learn to write simple regular expressions.
2. Get the content of a single web page and learn the basic IO stream in java
3. Insert the obtained data into the mysql database table to master basic JDBC programming.
5. Obtain the complete url of each city through url Splicing
6. Collect the number segments of the entire website and use batch processing + pre-compilation to insert data into database tables in batches.
7. Use StringBuilder to accelerate optimization
[Database table] Note that if you create a table using the cmd command, no quotation marks are required for the field name.
create table number_segment (`id` bigint not null auto_increment unique,`segment` char(7) not null primary key,`province` varchar(255) not null,`city` varchar(255) not null) default charset=utf8;
Import java. util. regex. matcher; import java. util. regex. pattern; public class test_ZhengZe {public static void main (String [] args) {Pattern p = Pattern. compile ("(13 \ d {5} [^ <])"); String s = "/mobile/guangzhou_1300040.> 1300040 </a> </li> <a href = \".. /.. /mobile/guangzhou_1300041.html \ "> 1300041 </a> </li> <a"; Matcher m = p. matcher (s); while (m. find () {System. out. println ("printed number section:" + m. group (0);} System. out. print ("captured data:" + m. groupCount ());}}
Import java. io. bufferedReader; import java. io. IOException; import java. io. inputStreamReader; import java.net. URL; import java. util. regex. matcher; import java. util. regex. pattern; public class getHtml {public static void main (String [] args) throws Exception {long start = System. currentTimeMillis (); String str_url = "http://www.hiphop8.com/city/guangdong/guangzhou.php"; // match Pattern p = Pattern. compile ("> (13 \ d {5} | 15 \ d {5} | 18 \ d {5} | 147 \ d {4}) <"); string html = get_Html (str_url); Matcher m = p. matcher (html); int num = 0; while (m. find () {System. out. println ("printed number section:" + m. group (1) + "Number" + (++ num);} System. out. println (num); long end = System. currentTimeMillis (); System. out. println ("time spent" + (end-start) + "millisecond");} public static String get_Html (String str_url) throws IOException {URL url URL = new URL (str_url ); st Ring content = ""; StringBuffer page = new StringBuffer (); try {BufferedReader in = new BufferedReader (new InputStreamReader (url. openStream (); while (content = in. readLine ())! = Null) {page. append (content) ;}} catch (IOException e) {// TODO Auto-generated catch blocke. printStackTrace ();} return page. toString ();}}Insert the collected content into the database]
The general operations for connecting java to the mysql database are as follows:
Load mysql driver --- create a database connection --- create an SQL statement execution object statement --- define a String-type SQL statement, statment: Call the SQL statement execution method --- Close the statment object and database.
Import java. SQL. driverManager; import java. SQL. SQLException; import java. SQL. statement; public class database {public static String driver = "com. mysql. jdbc. driver "; public static String url =" jdbc: mysql: // 127.0.0.1: 3306/tele_dat? AutoReconnect = true & characterEncoding = UTF-8 "; public static String user =" root "; public static String password =" 123456 "; public static Statement statement = null; public static java. SQL. connection conn = null; public static int I = 0; // create a data insertion method public static void datatoMySql (String SQL) throws SQLException {try {Class. forName (driver);} catch (ClassNotFoundException e) {System. out. println ("failed to load the driver"); e. printStackTrace ();} conn = DriverManager. getConnection (url, user, password); // create a connection statement = conn. createStatement (); // create a statemnetobject to transmit SQL sentence statement.exe cuteUpdate (SQL);} public static void close () throws SQLException {statement. close (); // close the database operation object conn. close (); // close database connection} // test database connection example public static void main (String args []) {String SQL = "insert into number_segment (segment, province, city) "+" values (123458, 'guangdong 1', 'guangzhou ') "; try {datatoMySql (SQL); System. out. println ("inserted successfully");} catch (SQLException e) {System. out. println ("insertion failed"); e. printStackTrace ();} try {close (); System. out. print ("Close Database");} catch (SQLException e) {e. printStackTrace ();}}}
I am using the mysql database integrated in wampsever and perform operations under cmd. For Common commands, see common mysql commands. If you are not familiar with jdbc programming, refer to this blog post.
[Retrieve the URL of the whole city on the website]
By viewing the source code of the home page of the website, you can obtain the URL of each province from here, and then observe the page of a province to obtain part of the suffix of the province's city url, the url of a complete city can be obtained through splicing.
Mport java. io. bufferedReader; import java. io. IOException; import java. io. inputStreamReader; import java.net. URL; import java. util. arrayList; import java. util. regex. matcher; import java. util. regex. pattern; public class get_all_city_url {public static void main (String [] args) throws Exception {String home_url = "http://www.hiphop8.com"; String pattern_pro = "\ w {3 }\\. \ w {7 }\\. \ w {3} \/\ w {4} \/\ w + "; // matches the U of the Province RLString pattern_city_hz = "<LI> <A href = \"(.*?) \ "Target = _ blank>"; // The city suffix Matcher mat_home = get (home_url, pattern_pro); int I = 0; // you can use ArrayList to save all URLs, in addition, you can use StringBuilder to add strings, but the test time is almost long start = System. currentTimeMillis (); while (mat_home.find () {String response = "http: //" + mat_home.group () + "/"; Matcher mat_city_hz = get (city_url_qz, pattern_city_hz ); while (mat_city_hz.find () {I ++; String city_url = city_url_qz + mat_city_hz.group (1); System. ou T. println (I + "" + city_url) ;}} long end = System. currentTimeMillis (); long time = end-start; System. out. println ("Total time" + time);} public static Matcher get (String str, String pa) throws Exception {String urlsource = get_Html (str); Pattern p = Pattern. compile (pa); Matcher m = p. matcher (urlsource); return m;} public static String get_Html (String str_url) throws IOException {URL url = new URL (str_url); String con Tent = ""; StringBuffer page = new StringBuffer (); try {BufferedReader in = new BufferedReader (new InputStreamReader (url. openStream (); while (content = in. readLine ())! = Null) {page. append (content) ;}} catch (IOException e) {e. printStackTrace () ;}return page. toString ();}}
With the above foundation, we can collect the number segments of the entire website. However, because the data in the inserted data table is more than 0.2 million, efficiency is more important.
In addition, there are a lot of java Collection tutorials on the Internet, some of which are well written, and I know that java is playing too little. I wrote this blog to sum up and leave a souvenir, second, I hope to provide some help to a beginner and discuss with you. Of course, if there is something wrong with the article, I also hope that the great gods will point it out.
The simplest java program
Public static void main <String args []> This is an error. brackets are not angle brackets,
Public static void main (String args [])
A simple java program
Post error message
Public class Test {
Public static void main (String args []) {
System. out. print ("this is my first Java Application! ");
System. out. println ("running successful! ");
}
}