Massive log warehouse receiving

Source: Internet
Author: User
In the file, the bank of contents is id = 2112112, email = xxx@163.com, and so on other, id = 2112112, email = xxx@163.com, and so on other, id = 2112112, email = xxx @ 1 massive log storage
There are 10 log files under the log, each file is compressed after about 60 mleft, the file suffix is .gz, such as a.gzw. B .gz, the contents of the file is id = 2112112, email = xxx@163.com, and so on other,
Id = 2112112, email = xxx@163.com, etc. other,
Id = 2112112, email = xxx@163.com, etc. other,
Id = 2112112, email = xxx@163.com, etc. other,
Id = 2112112, email = xxx@163.com, etc. other,
Id = 2112112, email = xxx@163.com, etc. other,
Id = 2112112, email = xxx@163.com, etc. other,

Now, we want to insert all the content of each file in this directory into the database. the tables in the database are divided by email, which is about log_1, log_2, until the sharding of log_1000, I would like to provide a detailed solution. for example, how can we ensure that each file is stored in the database in a short time to make script execution more efficient?
First paste a piece of code

Error_reporting (E_ALL &~ E_NOTICE );
// Receive parameters
$ Mysql_host = XX. XX;
$ Mysql_user = XXX;
$ Mysql_pass = XX;
$ Mysql_port = 3306;
$ Mysql_db = 'test ';
$ Table_pre = 'Log _';
$ Gz_log_file = a.gz;
// Script execution log
$ Exec_log = '/data_log/record.txt ';
File_put_contents ($ exec_log, '*************************************** ** START ***********************************'. "\ r \ n", FILE_APPEND );
File_put_contents ($ exec_log, 'param is mysql_host = '. $ mysql_host. 'mysql_user = '. $ mysql_user. 'mysql_pass = '. $ mysql_pass. 'mysql_port = '. $ mysql_port. 'mysql_db = '. $ mysql_db. 'table_pre = '. $ table_pre. 'gz_log_file = '. $ gz_log_file. 'start_time = '. date ("Y-m-d H: I: s "). "\ r \ n", FILE_APPEND );
// Read logs into the database
$ Z_handle = gzopen ($ gz_log_file, 'r ');
$ Time_start = microtime_float ();
$ Mysql_value_ary = array ();
// Link to the database
$ Conn = mysql_connect ("$ mysql_host: $ mysql_port", $ mysql_user, $ mysql_pass );
If (! $ Conn ){
File_put_contents ($ exec_log, 'could not connect database error, error = '. mysql_error (). "\ r \ n", FILE_APPEND );
Exit;
}
$ Selec_db = mysql_select_db ($ mysql_db );
If (! $ Selec_db ){
File_put_contents ($ exec_log, 'Select database error, database = '. $ mysql_db. "\ r \ n", FILE_APPEND );
Exit;
}
While (! Gzeof ($ z_handle )){
$ Each_gz_line = gzgets ($ z_handle, 4096 );
$ Line_to_array = explode ("\ t", $ each_gz_line );
// Filter invalid logs
If (! Empty ($ line_to_array [3]) &! Empty ($ line_to_array [2]) &! Empty ($ line_to_array [4]) {
$ Insert_value = "('". $ line_to_array [3]. "','". $ line_to_array [2]. "','". $ line_to_array [1]. "','". $ line_to_array [4]. "','". $ line_to_array [0]. "')";
$ Insert_ SQL = "insert into $ table_name (uid, email, ip, ctime) values $ insert_value ";
$ Table_id = abs (crc32 ($ line_to_array [2]) % 1000 );
$ Table_name = $ table_pre. $ table_id;
$ Result = mysql_query ($ insert_ SQL );
If (! $ Result ){
// Logs are recorded if an insert error occurs.
File_put_contents ($ exec_log, 'Table _ name = '. $ table_name. 'email ='. $ line_to_array [2]. "\ r \ n", FILE_APPEND );
}
}
}
$ Time_end = microtime_float ();
$ Diff = $ time_end-$ time_start;
File_put_contents ($ exec_log, 'Success to insert database, log_file is '. $ gz_log_file. 'time-consuming is ='. $ diff. "s \ r \ n", FILE_APPEND );
File_put_contents ($ exec_log, '*************************************** *********************************** '. "\ r \ n", FILE_APPEND );
Gzclose ($ z_handle );

The code above is very slow and intolerable. please help me
------ Solution --------------------
Modify the table type to InnoDB, and then implement it using transactions,
If not, load file
------ Solution --------------------
For innodb, opening a transaction should not be slower, because even if it is not opened, every statement is also a transaction. Therefore, if it is to open only one transaction, it will be committed once, it should be faster than every statement and faster than a commit statement (but I remember that it would not be much faster when something is opened). However, myisam only executes in one insert thread, in addition, when the total data volume in the table is relatively small, it must be faster than innodb, especially in environments with only 60 MB of data

Load data infile is definitely much faster, but you have to convert your file to another "xxx \ t xxx" format first, and then load data infile, which should be several times faster than inserts.
------ Solution --------------------
Load data. load the data and compare the number of items. do not perform any transactions. The error probability is very low. even if an error occurs, it will be faster to re-import after deletion. PS. this data is not called massive data.
------ Solution --------------------
I don't know why it should be stored in the database.
According to your description, after the data file is expanded, each file is about 60*20 m, or even higher.
It's strange that you insert one by one.
------ Solution --------------------
Historical data is only a one-time task. No "efficiency"
You can directly import the file into the text field and then split it by the update command.

If you do not want to modify the log processing method, append the incremental logs to the database as a regular operation (the cycle must be at least one day)
There is no efficiency concept either.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.