Massive log warehouse receiving

Last Update:2014-03-29 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

In the file, the bank of contents is id = 2112112, email = xxx@163.com, and so on other, id = 2112112, email = xxx@163.com, and so on other, id = 2112112, email = xxx @ 1 massive log storage
There are 10 log files under the log, each file is compressed after about 60 mleft, the file suffix is .gz, such as a.gzw. B .gz, the contents of the file is id = 2112112, email = xxx@163.com, and so on other,
Id = 2112112, email = xxx@163.com, etc. other,
Id = 2112112, email = xxx@163.com, etc. other,
Id = 2112112, email = xxx@163.com, etc. other,
Id = 2112112, email = xxx@163.com, etc. other,
Id = 2112112, email = xxx@163.com, etc. other,
Id = 2112112, email = xxx@163.com, etc. other,

Now, we want to insert all the content of each file in this directory into the database. the tables in the database are divided by email, which is about log_1, log_2, until the sharding of log_1000, I would like to provide a detailed solution. for example, how can we ensure that each file is stored in the database in a short time to make script execution more efficient?
First paste a piece of code



 Error_reporting (E_ALL &~ E_NOTICE );
// Receive parameters
$ Mysql_host = XX. XX;
$ Mysql_user = XXX;
$ Mysql_pass = XX;
$ Mysql_port = 3306;
$ Mysql_db = 'test ';
$ Table_pre = 'Log _';
$ Gz_log_file = a.gz;
// Script execution log
$ Exec_log = '/data_log/record.txt ';
File_put_contents ($ exec_log, '*************************************** ** START ***********************************'. "\ r \ n", FILE_APPEND );
File_put_contents ($ exec_log, 'param is mysql_host = '. $ mysql_host. 'mysql_user = '. $ mysql_user. 'mysql_pass = '. $ mysql_pass. 'mysql_port = '. $ mysql_port. 'mysql_db = '. $ mysql_db. 'table_pre = '. $ table_pre. 'gz_log_file = '. $ gz_log_file. 'start_time = '. date ("Y-m-d H: I: s "). "\ r \ n", FILE_APPEND );
// Read logs into the database
$ Z_handle = gzopen ($ gz_log_file, 'r ');
$ Time_start = microtime_float ();
$ Mysql_value_ary = array ();
// Link to the database
$ Conn = mysql_connect ("$ mysql_host: $ mysql_port", $ mysql_user, $ mysql_pass );
If (! $ Conn ){
File_put_contents ($ exec_log, 'could not connect database error, error = '. mysql_error (). "\ r \ n", FILE_APPEND );
Exit;
}
$ Selec_db = mysql_select_db ($ mysql_db );
If (! $ Selec_db ){
File_put_contents ($ exec_log, 'Select database error, database = '. $ mysql_db. "\ r \ n", FILE_APPEND );
Exit;
}
While (! Gzeof ($ z_handle )){
$ Each_gz_line = gzgets ($ z_handle, 4096 );
$ Line_to_array = explode ("\ t", $ each_gz_line );
// Filter invalid logs
If (! Empty ($ line_to_array [3]) &! Empty ($ line_to_array [2]) &! Empty ($ line_to_array [4]) {
$ Insert_value = "('". $ line_to_array [3]. "','". $ line_to_array [2]. "','". $ line_to_array [1]. "','". $ line_to_array [4]. "','". $ line_to_array [0]. "')";
$ Insert_ SQL = "insert into $ table_name (uid, email, ip, ctime) values $ insert_value ";
$ Table_id = abs (crc32 ($ line_to_array [2]) % 1000 );
$ Table_name = $ table_pre. $ table_id;
$ Result = mysql_query ($ insert_ SQL );
If (! $ Result ){
// Logs are recorded if an insert error occurs.
File_put_contents ($ exec_log, 'Table _ name = '. $ table_name. 'email ='. $ line_to_array [2]. "\ r \ n", FILE_APPEND );
}
}
}
$ Time_end = microtime_float ();
$ Diff = $ time_end-$ time_start;
File_put_contents ($ exec_log, 'Success to insert database, log_file is '. $ gz_log_file. 'time-consuming is ='. $ diff. "s \ r \ n", FILE_APPEND );
File_put_contents ($ exec_log, '*************************************** *********************************** '. "\ r \ n", FILE_APPEND );
Gzclose ($ z_handle );

The code above is very slow and intolerable. please help me
------ Solution --------------------
Modify the table type to InnoDB, and then implement it using transactions,
If not, load file
------ Solution --------------------
For innodb, opening a transaction should not be slower, because even if it is not opened, every statement is also a transaction. Therefore, if it is to open only one transaction, it will be committed once, it should be faster than every statement and faster than a commit statement (but I remember that it would not be much faster when something is opened). However, myisam only executes in one insert thread, in addition, when the total data volume in the table is relatively small, it must be faster than innodb, especially in environments with only 60 MB of data

Load data infile is definitely much faster, but you have to convert your file to another "xxx \ t xxx" format first, and then load data infile, which should be several times faster than inserts.
------ Solution --------------------
Load data. load the data and compare the number of items. do not perform any transactions. The error probability is very low. even if an error occurs, it will be faster to re-import after deletion. PS. this data is not called massive data.
------ Solution --------------------
I don't know why it should be stored in the database.
According to your description, after the data file is expanded, each file is about 60*20 m, or even higher.
It's strange that you insert one by one.
------ Solution --------------------
Historical data is only a one-time task. No "efficiency"
You can directly import the file into the text field and then split it by the update command.

If you do not want to modify the log processing method, append the incremental logs to the database as a regular operation (the cycle must be at least one day)
There is no efficiency concept either.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Massive log warehouse receiving

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Massive log warehouse receiving

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support