Simple java Collection Procedure 2: java Collection

Source: Internet
Author: User
Tags mysql command line

Simple java Collection Procedure 2: java Collection

Following the simple java Collection Program, the collection task of the number segment of the entire website is completed.

[Use pre-compilation + batch processing to collect webpage content to the database table]

 

Previously, we used the statement class to create execution objects for SQL statements to insert fields to the database. However, due to the large amount of data inserted, if we continue to use the statement operation, it will take a lot of time. We use its subclass PreparedStatement for operations.

PreparedStatement can be used to pre-compile SQL statements. You only need to pass the parameter through its setString () method. This not only improves the efficiency, but also improves the security and prevents SQL injection. Recommended articles

 

In addition, we can call its addBatch () method and executeBatch () to implement batch insert processing.

The Code is as follows. I like to use the database link as a separate class.

Import java. SQL. driverManager; import java. SQL. SQLException; import com. mysql. jdbc. connection; public class database {public static String driver = "com. mysql. jdbc. driver "; public static String url =" jdbc: mysql: // 127.0.0.1: 3306/tele_dat? AutoReconnect = true & characterEncoding = UTF-8 "; public static String user =" root "; public static String password =" 123456 "; public static java. SQL. connection conn = null; // return a database Connection object public static Connection ConnectToDataBase () {try {Class. forName (driver);} catch (ClassNotFoundException e) {System. out. println ("failed to load the driver"); e. printStackTrace ();} try {conn = DriverManager. getConnection (url, user, password); System. out. println ("connection successful");} catch (SQLException e) {System. out. println ("connection failed"); e. printStackTrace () ;}return (Connection) conn ;}// test the Connection database public static void main (String args []) {database. connectToDataBase ();}}

Main Program

Import java. io. bufferedReader; import java. io. IOException; import java. io. inputStreamReader; import java.net. URL; import java. SQL. connection; import java. SQL. preparedStatement; import java. util. regex. matcher; import java. util. regex. pattern; public class Crawl {private static Connection conn = database. connectToDataBase (); static String home_url = "http://www.hiphop8.com"; // static String pattern_pro_c Ity = "<DIV class = title> <SPAN> (.*?) -(.*?) <\\/SPAN> <\\/DIV> "; // matches the province name, city name static String pattern_number = "> (13 \ d {5} | 15 \ d {5} | 18 \ d {5} | 147 \ d {4 }) <"; // match the code segment static String pattern_pro =" \ w {3 }\\. \ w {7 }\\. \ w {3} \/\ w {4} \/\ w + "; // province URLstatic String pattern_city_hz = "<LI> <A href = \"(. *?) \ "Target = _ blank>"; // Suffix of the city URL // compilation preprocessing options static String insertSQL = "insert ignore into number_segment (segment, province, city) values (?, ?, ?) "; Static PreparedStatement pst = null; static int num_pro = 0; static int num_city = 0; static int all_num_tele = 0; public static void main (String [] args) throws Exception {String PreStat = "insert ignore into number_segment (segment, province, city) values (?,?,?) "; Pst = conn. prepareStatement (PreStat. toString (); Matcher mat_home = get (home_url, pattern_pro); long start = System. currentTimeMillis (); while (mat_home.find () {num_pro ++; System. out. println ("------ No." + num_pro + "province -----"); String city_url_qz = "http: //" + mat_home.group () + "/"; int len = city_url_qz.length (); // replace it with StringBuffer to add the StringBuffer city_ur = new StringBuffer (); city_ur.append (city_url_qz ); Matcher mat_city_hz = get (city_url_qz, pattern_city_hz); while (mat_city_hz.find () // obtain the complete url of the city {num_city ++; System. out. println ("+" + num_city + ""); String last_city_url = city_ur.append (mat_city_hz.group (1 )). toString (); // String last_city_url = city_url_qz + mat_city_hz.group (1); int len2 = encrypt (); encrypt (last_city_url); city_ur.delete (len, len2 );}} long end = System. currentTi MeMillis (); long time = (end-start)/(1000*60); conn. close (); System. out. println ("Total number of phone number segments queried:" + all_num_tele); System. out. println ("the time spent is:" + time);} public static void One_City_Tele_to_DB (String url) throws Exception {int this_city_num = 0; String pro = null; String city = null; matcher mat_pro_city = get (url, pattern_pro_city); // get the name of the province, city, while (mat_pro_city.find () {String long_pro = mat_pro_city.grou P (1); pro = long_pro.substring (0, long_pro.length ()-1); String long_city = mat_pro_city.group (2); city = long_city.substring (0, long_city.length ()-10 ); system. out. println ("province:" + pro + "" + "city:" + city + "inserting number segments into the database");} Matcher mat_number = get (url, pattern_number ); // obtain the number segment while (mat_number.find () {pst. setString (1, mat_number.group (1); pst. setString (2, pro); pst. setString (3, city); pst. addBatch (); thi S_city_num ++; all_num_tele ++;} pst.exe cuteBatch (); // Insert the number segment pst of a city in batches each time. clearBatch (); System. out. println ("the number of number segments inserted in the city is:" + this_city_num);} // regular match public static Matcher get (String str_url, String pattern) throws Exception {String urlsource = get_Html (str_url); Pattern p = Pattern. compile (pattern); Matcher m = p. matcher (urlsource); return m;} // get the webpage content public static String get_Html (String str_url) throws IOException {URL url = new URL (str_url); String content = ""; StringBuffer page = new StringBuffer (); try {BufferedReader in = new BufferedReader (new InputStreamReader (url. openStream (); while (content = in. readLine ())! = Null) {page. append (content) ;}} catch (IOException e) {e. printStackTrace () ;}return page. toString ();}}

During the actual running of the program, we found that there were more than 500 duplicate number segments. Because Xiangfan city was changed to Xiangyang city, the number segments of the two cities were all the same, and the database table used segment (number) as the primary key, therefore, when an SQL statement with the same primary key is inserted, it is automatically skipped by adding ignore to the insert statement.

In addition, the id is set to auto_increment, but if the data in the data table is cleared, the id will not start from 1 again, in this case, you only need to enter truncate table table_name In the mysql command line to implement id starting from 1.

Run the result import java. io. bufferedReader; import java. io. IOException; import java. io. inputStreamReader; import java.net. URL; import java. SQL. connection; import java. SQL. preparedStatement; import java. util. regex. matcher; import java. util. regex. pattern; public class SecondCrawl {private static Connection conn = database. connectToDataBase (); // pre-compiled + StringBuilderstatic StringBuilder PreStat = new StringBuilder (); sta Tic String Qz = "insert ignore into number_segment (segment, province, city) values"; static String insertSQL = "insert ignore into number_segment (segment, province, city) values (?, ?, ?) "; Static int len1 = Qz. length (); static PreparedStatement pst = null; static String home_url = "http://www.hiphop8.com"; static String pattern_pro_city = "<DIV class = title> <SPAN> (. *?) -(.*?) <\\/SPAN> <\\/DIV> "; // matches the province name, city name static String pattern_number = "> (13 \ d {5} | 15 \ d {5} | 18 \ d {5} | 147 \ d {4 }) <"; // match the code segment static String pattern_pro =" \ w {3 }\\. \ w {7 }\\. \ w {3} \/\ w {4} \/\ w + "; // province URLstatic String pattern_city_hz = "<LI> <A href = \"(. *?) \ "Target = _ blank>"; // The city URL suffix static int num_pro = 0; static int num_city = 0; static int all_num_tele = 0; public static void main (String [] args) throws Exception {Matcher mat_home = get (home_url, pattern_pro); conn. setAutoCommit (true); PreStat. append (Qz); pst = conn. prepareStatement (insertSQL); // pre-compiled long start = System. currentTimeMillis (); while (mat_home.find () {num_pro ++; System. out. println ("------ th" + num _ Pro + "province -----"); String city_url_qz = "http: //" + mat_home.group () + "/"; int len = city_url_qz.length (); stringBuffer city_ur = new StringBuffer (); city_ur.append (hour); Matcher mat_city_hz = get (hour, pattern_city_hz); while (hour () // obtain the url of the city {num_city ++; system. out. println ("+" + num_city + ""); String city_url = city_ur.append (mat_city_hz.group (1 )). toString (); int len2 = city_url.length (); One_City_Tele_to_DB (city_url); city_ur.delete (len, len2) ;}} long end = System. currentTimeMillis (); long time = (end-start)/(1000*60 seconds until pst.exe cuteBatch (); // the remaining parts of the conn at the end of the batch execution. close (); System. out. println ("Total number of phone number segments queried:" + all_num_tele); System. out. println ("the time spent is:" + time + "multiple minutes \ n" + "in microseconds:" + (end-start) + "microsecond ");} // The public static void One_City_Tele_to_DB (String url) throws Exception {String city = nu Ll; String pro = null; int this_city_num = 0; Matcher mat_pro_city = get (url, pattern_pro_city); while (mat_pro_city.find () {String long_pro = mat_pro_city.group (1 ); pro = long_pro.substring (0, long_pro.length ()-1); String long_city = mat_pro_city.group (2); city = long_city.substring (0, long_city.length ()-10); System. out. println ("province:" + pro + "" + "city:" + city + "inserting number segments into the database... ");} String temp =", '"+ pro + "','" + City + "'),"; Matcher mat_number = get (url, pattern_number); while (mat_number.find () {PreStat. append ("(" + mat_number.group (1 )). append (temp); this_city_num ++; all_num_tele ++; if (all_num_tele <= 208000 & all_num_tele % 2000 = 0) {PreStat. deleteCharAt (PreStat. length ()-1); // remove the comma pst after the SQL statement. addBatch (PreStat. toString (); pst.exe cuteBatch (); pst. clearBatch (); PreStat. delete (len1, PreStat. length ());/ /The SQL statement is followed by a release space} if (all_num_tele> 208000) // the cities with less than 2000 of the remaining parts do not execute {PreStat. deleteCharAt (PreStat. length ()-1); pst. addBatch (PreStat. toString (); PreStat. delete (len1, PreStat. length ();} System. out. println ("the number of number segments inserted in the city is:" + this_city_num);} // regular match public static Matcher get (String str_url, String pattern) throws Exception {String urlsource = get_Html (str_url); Pattern p = Pattern. compile (pattern ); Matcher m = p. matcher (urlsource); return m ;}// obtain the webpage content public static String get_Html (String str_url) throws IOException {URL url = new URL (str_url); String content = ""; stringBuffer page = new StringBuffer (); try {BufferedReader in = new BufferedReader (new InputStreamReader (url. openStream (); while (content = in. readLine ())! = Null) {page. append (content) ;}} catch (IOException e) {e. printStackTrace () ;}return page. toString ();}}

Run

 

After testing several times, the running time is about 2 minutes, and the speed is increased a lot, but there is still a lot of room for improvement, because during the test, if the program only inserts more than 0.2 million SQL statements, it can be completed within several seconds.

After optimization, my idea is to use multiple threads to collect and insert website URLs into the database for concurrent operations. Now I am learning java multithreading and trying to write the collection program using multiple threads, if you still have a better method, you can leave a message and want to share your progress with us.


Write the simplest program in java 2*8 =? Write code?

Steps:
1. Create a New. java file. The file name must be case-sensitive and consistent with the public class name, such as Test. java.
Ii. write code
Public class Test {
Public static void main (String [] args ){
System. out. println ("2*8 =" + 2*8 );
}
}
Iii. Compilation
Without IDE (integrated development environment)
Use javac Test. java at the command prompt
Then execute java Test
Note: The prompt path is the path where Test. java is located.
I have passed the test. No problem!

Simple Java Applet

The saddle point of a java two-dimensional array

Code:
Public class Dort
{
Public static void main (String args [])
{
Int [] [] mat = {, 3}, {, 6}, {, 9 }};
For (int I = 0; I <mat. length; I ++) // outputs two-dimensional array elements.
{
Supplemental code
}
Boolean find = false; // locate the saddle point mark
Int row = 0; // 1st rows subscript
Int max = 0; // column subscript that records the maximum value of the current row
While (! Find & row <mat. length)
{
Max = 0; // initially set the maximum value of 1st columns per line
For (int j = 1; j <mat [row]. length; j ++) // find the maximum value on the row
If (mat [row] [j]> mat [row] [max]) // mat [row] [max] is the maximum value of this row.
Max = j;
Boolean yes = true; // judge whether mat [row] [max] is the smallest value in the column.
Int j = 0;
While (yes & j <mat. length)
{
Supplemental code
}
If (yes)
Find = true;
Else
Row ++;
}
If (find)
System. out. println ("The dort:" + mat [row] [max]);
Else
System. out. println ("The dort: null ");
}
}

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.