How to optimize tens of millions of mysql data

Source: Internet
Author: User
How to optimize tens of millions of mysql data this post is finally edited by demofere from 2015-02-1113:56:31 function & nbsp; insert () {global & nbsp; $ m; if & nbsp ;(! Isset ($ _ REQUEST [strs]) & nbsp; | & nbsp ;! Isset (how to optimize tens of millions of mysql data

This post was last edited by demofere on 13:56:31


Function insert (){
Global $ m;
If (! Isset ($ _ REQUEST ['strs']) |! Isset ($ _ REQUEST ['type'])
|! Isset ($ _ REQUEST ['hash']) {
Echo 'param error ';
Return;
}

// Strs is all strings
$ Poststr = $ _ REQUEST ['strs'];

$ Xstrs = json_decode (stripslashes ($ poststr), true );

$ Type = $ _ REQUEST ['type'];
$ Hash = $ _ REQUEST ['hash'];

If (count ($ xstrs) <= 0 ){
$ Msg = 'Str error ';
DsLog: errLog ($ msg. $ poststr );
Echo $ msg;
Return;
}

If ($ type = '0 '){
$ Table = 'white ';
}
Else if ($ type = '1 '){
$ Table = 'black ';
}
Else {
$ Msg = 'type error ';
DsLog: errLog ($ msg );
Echo $ msg;
Return;
}

$ Strs = array ();

For ($ I = 0; $ I <count ($ xstrs); $ I ++ ){
$ Strtmp = $ xstrs [$ I];
$ Strtmp = trim ($ strtmp );
$ Strtmp = strtolower ($ strtmp );
$ Strtmp = addslashes ($ strtmp );
If (strlen ($ strtmp) & gt; 256 ){
$ Strtmp = substr ($ strtmp, 0,256 );
}
If (strlen ($ strtmp)> = 7)
{
Array_push ($ strs, $ strtmp );
}
}


// Concatenate all strings
$ Tmp = '("'. implode ('", "', $ strs ).'")';

// Obtain an existing string
$ SQL = "select * from $ table where str in $ tmp ";
$ Ret = mysql_query ($ SQL, $ m );
If (! $ Ret ){
$ Msg = 'exec error: '. mysql_error ($ m).', '. $ SQL;
DsLog: errLog ($ msg );
Echo $ msg;
Return;
}

$ Exists = array ();
$ Notexists = array ();
$ Count = mysql_num_rows ($ ret );
For ($ I = 0; $ I <$ count; $ I ++)
{
$ Item = mysql_fetch_assoc ($ ret );
If (! $ Item ){
Break;
}
Array_push ($ exists, $ item ['str']);
}

For ($ I = 0; $ I <count ($ strs); $ I ++ ){
If (in_array ($ strs [$ I], $ exists )){
Continue;
}

Array_push ($ notexists, $ strs [$ I]);
}

For ($ I = 0; $ I <count ($ exists); $ I ++ ){
$ Exists [$ I] = addslashes ($ exists [$ I]);
}
For ($ I = 0; $ I <count ($ notexists); $ I ++ ){
$ Notexists [$ I] = addslashes ($ notexists [$ I]);
}

If (count ($ exists)> 0 ){
// Update the count field of an existing string
$ Tmp = '("'. implode ('", "', $ exists ).'")';
$ Time = date ('ymdhi ');
$ SQL = "update $ table set count = count + 1 where str in $ tmp ";
$ Ret = mysql_query ($ SQL, $ m );
If (! $ Ret ){
$ Msg = 'exec error: '. mysql_error ($ m).', '. $ SQL;
DsLog: errLog ($ msg );
Echo $ msg;
Return;
}

// Update the upd field of an existing string
$ SQL = "update $ table set upd = '$ time' where str in $ tmp ";
$ Ret = mysql_query ($ SQL, $ m );
If (! $ Ret ){
$ Msg = 'exec error: '. mysql_error ($ m).', '. $ SQL;
DsLog: errLog ($ msg );
Echo $ msg;
Return;
}
}


// Insert new information
If (count ($ notexists)> 0 ){
$ Time = date ('ymdhi ');
$ SQL = "insert ignore into $ table (str, hash, count, upd) values ";
For ($ I = 0; $ I <count ($ notexists); $ I ++ ){
$ Str = $ notexists [$ I];
$ Crc = sprintf ("% u", crc32 ($ str ));
$ SQL. = "('$ str', '$ crc', '1', '$ time '),";
}

$ SQL = substr ($ SQL, 0, strlen ($ SQL)-1 );
$ Ret = mysql_query ($ SQL, $ m );
If (! $ Ret ){
$ Msg = 'Insert error: '. mysql_error ($ m).', '. $ SQL;
DsLog: errLog ($ msg );
Echo $ msg;
Return;
}
}

Echo !! $ Ret;
}


Now I want to make statistics on the number of strings, which is similar to a map. this character string: Number. of course, the database also saves the crc information in the character.
The insert function receives strs from post. this is a string array. The average number of strings sent from each post is 1500.
Currently, there are hundreds of thousands of records in the database, and the average time consumed by each insert operation is 20 seconds (usually 30 seconds of timeout)
I would like to ask you how to optimize it. now, only a small portion of the statistics are collected. the estimated data volume is about 1-10 billion.
------ Solution ----------------------
Are you sure the bottleneck lies in the three statements for database query and modification?
You said the average number of strings in each post is 1500.
Is that 1500 words? Even if it is not (string length), there are 75 words based on an average of 20 characters per word.
Your filter condition is str in $ tmp (str in ('XXX', 'XXX '....)) that is, each record in the table must be compared with a string of 1500 (75) times, and the hit rate is at most 1/1500 (1/75). do you think this is appropriate?
The database's strength lies in the comparison between records, while you are comparing between columns that are weak.
You can use the input data to construct a temporary table (one word per row) and associate it with the master table. In this way, we can take advantage of the database.

What's more, can't your two consecutive update statements be merged together?
------ Solution ----------------------
1. mysql has a memory table. you have not used it.
2. Merge two updates, which can be reduced by 10 seconds.
------ Solution ----------------------
Mysql_error library functions are not efficient and are not officially recommended. I have hardly used them. when I was learning PHP, I started PDO directly.

Also, if you use the in keyword, it seems that it does not take the index.
------ Solution ----------------------
Reference:
The effect is almost the same.

It should be a problem in your testing environment. in most cases, join table queries are faster than in.
Your test results may be due to the following reasons:
1. the data in the query table does not reach a certain amount.
2. the result set found in the two query methods is different.
3. the query field is not indexed or is incorrect.
4. being affected by excessive indexes
5. other reasons
I think the obvious reason is 2nd.
When you join a table, no query field is specified. the joined table must have more fields than the non-connected table.
There is also a one-to-Multiple Scenario for table queries, so that one record will become multiple, and the group

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.