In the file, the bank of contents is id = 2112112, email = xxx@163.com, and so on other, id = 2112112, email = xxx@163.com, and so on other, id = 2112112, email = xxx @ 1 massive log storage
There are 10 log files under the log, each file is compressed after about 60 mleft, the file suffix is .gz, such as a.gzw. B .gz, the contents of the file is id = 2112112, email = xxx@163.com, and so on other,
Id = 2112112, email = xxx@163.com, etc. other,
Id = 2112112, email = xxx@163.com, etc. other,
Id = 2112112, email = xxx@163.com, etc. other,
Id = 2112112, email = xxx@163.com, etc. other,
Id = 2112112, email = xxx@163.com, etc. other,
Id = 2112112, email = xxx@163.com, etc. other,
Now, we want to insert all the content of each file in this directory into the database. the tables in the database are divided by email, which is about log_1, log_2, until the sharding of log_1000, I would like to provide a detailed solution. for example, how can we ensure that each file is stored in the database in a short time to make script execution more efficient?
First paste a piece of code
Error_reporting (E_ALL &~ E_NOTICE );
// Receive parameters
$ Mysql_host = XX. XX;
$ Mysql_user = XXX;
$ Mysql_pass = XX;
$ Mysql_port = 3306;
$ Mysql_db = 'test ';
$ Table_pre = 'Log _';
$ Gz_log_file = a.gz;
// Script execution log
$ Exec_log = '/data_log/record.txt ';
File_put_contents ($ exec_log, '*************************************** ** START ***********************************'. "\ r \ n", FILE_APPEND );
File_put_contents ($ exec_log, 'param is mysql_host = '. $ mysql_host. 'mysql_user = '. $ mysql_user. 'mysql_pass = '. $ mysql_pass. 'mysql_port = '. $ mysql_port. 'mysql_db = '. $ mysql_db. 'table_pre = '. $ table_pre. 'gz_log_file = '. $ gz_log_file. 'start_time = '. date ("Y-m-d H: I: s "). "\ r \ n", FILE_APPEND );
// Read logs into the database
$ Z_handle = gzopen ($ gz_log_file, 'r ');
$ Time_start = microtime_float ();
$ Mysql_value_ary = array ();
// Link to the database
$ Conn = mysql_connect ("$ mysql_host: $ mysql_port", $ mysql_user, $ mysql_pass );
If (! $ Conn ){
File_put_contents ($ exec_log, 'could not connect database error, error = '. mysql_error (). "\ r \ n", FILE_APPEND );
Exit;
}
$ Selec_db = mysql_select_db ($ mysql_db );
If (! $ Selec_db ){
File_put_contents ($ exec_log, 'Select database error, database = '. $ mysql_db. "\ r \ n", FILE_APPEND );
Exit;
}
While (! Gzeof ($ z_handle )){
$ Each_gz_line = gzgets ($ z_handle, 4096 );
$ Line_to_array = explode ("\ t", $ each_gz_line );
// Filter invalid logs
If (! Empty ($ line_to_array [3]) &! Empty ($ line_to_array [2]) &! Empty ($ line_to_array [4]) {
$ Insert_value = "('". $ line_to_array [3]. "','". $ line_to_array [2]. "','". $ line_to_array [1]. "','". $ line_to_array [4]. "','". $ line_to_array [0]. "')";
$ Insert_ SQL = "insert into $ table_name (uid, email, ip, ctime) values $ insert_value ";
$ Table_id = abs (crc32 ($ line_to_array [2]) % 1000 );
$ Table_name = $ table_pre. $ table_id;
$ Result = mysql_query ($ insert_ SQL );
If (! $ Result ){
// Logs are recorded if an insert error occurs.
File_put_contents ($ exec_log, 'Table _ name = '. $ table_name. 'email ='. $ line_to_array [2]. "\ r \ n", FILE_APPEND );
}
}
}
$ Time_end = microtime_float ();
$ Diff = $ time_end-$ time_start;
File_put_contents ($ exec_log, 'Success to insert database, log_file is '. $ gz_log_file. 'time-consuming is ='. $ diff. "s \ r \ n", FILE_APPEND );
File_put_contents ($ exec_log, '*************************************** *********************************** '. "\ r \ n", FILE_APPEND );
Gzclose ($ z_handle );
The code above is very slow and intolerable. please help me
------ Solution --------------------
Modify the table type to InnoDB, and then implement it using transactions,
If not, load file
------ Solution --------------------
For innodb, opening a transaction should not be slower, because even if it is not opened, every statement is also a transaction. Therefore, if it is to open only one transaction, it will be committed once, it should be faster than every statement and faster than a commit statement (but I remember that it would not be much faster when something is opened). However, myisam only executes in one insert thread, in addition, when the total data volume in the table is relatively small, it must be faster than innodb, especially in environments with only 60 MB of data
Load data infile is definitely much faster, but you have to convert your file to another "xxx \ t xxx" format first, and then load data infile, which should be several times faster than inserts.
------ Solution --------------------
Load data. load the data and compare the number of items. do not perform any transactions. The error probability is very low. even if an error occurs, it will be faster to re-import after deletion. PS. this data is not called massive data.
------ Solution --------------------
I don't know why it should be stored in the database.
According to your description, after the data file is expanded, each file is about 60*20 m, or even higher.
It's strange that you insert one by one.
------ Solution --------------------
Historical data is only a one-time task. No "efficiency"
You can directly import the file into the text field and then split it by the update command.
If you do not want to modify the log processing method, append the incremental logs to the database as a regular operation (the cycle must be at least one day)
There is no efficiency concept either.