MySQL Tens data how to do optimization
This post was last edited by Demofere on 2015-02-11 13:56:31
function Insert () {
Global $m;
if (!isset ($_request[' STRs ')) | |!isset ($_request[' type '])
|| !isset ($_request[' hash ')) {
echo ' param error ';
Return
}
STRs for all strings
$poststr = $_request[' STRs ');
$xstrs = Json_decode (Stripslashes ($POSTSTR), true);
$type = $_request[' type '];
$hash = $_request[' hash '];
if (count ($xstrs) <= 0) {
$msg = ' str error ';
Dslog::errlog ($msg. $poststr);
Echo $msg;
Return
}
if ($type = = ' 0 ') {
$table = ' white ';
}
else if ($type = = ' 1 ') {
$table = ' black ';
}
else{
$msg = ' type error ';
Dslog::errlog ($msg);
Echo $msg;
Return
}
$strs = Array ();
for ($i = 0; $i < count ($xstrs); $i + +) {
$strtmp = $xstrs [$i];
$strtmp = Trim ($strtmp);
$strtmp = Strtolower ($strtmp);
$strtmp = Addslashes ($strtmp);
if (strlen ($strtmp) > 256) {
$strtmp = substr ($strtmp, 0, 256);
}
if (strlen ($strtmp) >= 7)
{
Array_push ($strs, $strtmp);
}
}
Stitching all strings
$tmp = ' ('. Implode (', ', ' ', $strs). ') ';
Get string that already exists
$sql = "SELECT * from $table the where str in $tmp";
$ret = mysql_query ($sql, $m);
if (! $ret) {
$msg = ' EXEC error: '. Mysql_error ($m). ', '. $sql;
Dslog::errlog ($msg);
Echo $msg;
Return
}
$exists = Array ();
$notexists = Array ();
$count = mysql_num_rows ($ret);
for ($i = 0; $i < $count; $i + +)
{
$item = Mysql_fetch_assoc ($ret);
if (! $item) {
Break
}
Array_push ($exists, $item [' str ']);
}
for ($i = 0; $i < count ($strs); $i + +) {
if (In_array ($strs [$i], $exists)) {
Continue
}
Array_push ($notexists, $strs [$i]);
}
for ($i = 0; $i < count ($exists); $i + +) {
$exists [$i] = Addslashes ($exists [$i]);
}
for ($i = 0; $i < count ($notexists); $i + +) {
$notexists [$i] = Addslashes ($notexists [$i]);
}
if (count ($exists) > 0) {
Update the Count field for a string that already exists
$tmp = ' ('. Implode (', ', ' ', $exists). ') ';
$time = Date (' Ymdhi ');
$sql = "Update $table set count=count+1 where str in $tmp";
$ret = mysql_query ($sql, $m);
if (! $ret) {
$msg = ' EXEC error: '. Mysql_error ($m). ', '. $sql;
Dslog::errlog ($msg);
Echo $msg;
Return
}
Update the UPD field for a string that already exists
$sql = "Update $table set upd= ' $time ' where str in $tmp";
$ret = mysql_query ($sql, $m);
if (! $ret) {
$msg = ' EXEC error: '. Mysql_error ($m). ', '. $sql;
Dslog::errlog ($msg);
Echo $msg;
Return
}
}
Insert new information
if (count ($notexists) > 0) {
$time = Date (' Ymdhi ');
$sql = "Insert ignore into $table (Str,hash,count, upd) values";
for ($i = 0; $i < count ($notexists); $i + +) {
$str = $notexists [$i];
$CRC = sprintf ("%u", CRC32 ($STR));
$sql. = "(' $str ', ' $CRC ', ' 1 ', ' $time '),";
}
$sql = substr ($sql, 0, strlen ($sql)-1);
$ret = mysql_query ($sql, $m);
if (! $ret) {
$msg = ' Insert error: '. Mysql_error ($m). ', '. $sql;
Dslog::errlog ($msg);
Echo $msg;
Return
}
}
Echo!! $ret;
}
I'm now going to count the strings, count the number of each string, just like a map, the string: The number, of course, the database also saved a bit of the CRC and other information in the character
The Insert function receives the post-STRs, which is an array of strings, with an average of 1500 strings per post.
Now the records in the database 1200W, each time the insert takes an average of 20 seconds (often 30 seconds timeout)
I would like to ask you, how to do optimization, now only a small part of the statistics are expected to complete the data volume of about 11 billion
------to solve the idea----------------------
Can you make sure that the bottleneck is on the 3 statements that the database makes for querying and modifying?
You said that the average number of strings that came in each post was 1500.
Does that mean 1500 of the words? If not (string length) then the average word is 20 characters, and there are 75 words
Your filter condition is str in $tmp (' xxx ', ' xxx ' ....) This is the 1500 (75) string comparison of each record in the table, and the hit rate is at most 1/1500 (1/75) Do you think it is appropriate to do so?
The strength of the database is the comparison between records, and you're doing a weaker column comparison
You can use the incoming data to construct a temporary table (one word per line) and then associate the operation with the primary table. So you can take advantage of the database.
Again, your 2 consecutive update statements cannot be merged together?
------to solve the idea----------------------
1, MySQL has memory table, you do not use
2, the two update merge, should be able to reduce 10 seconds
------to solve the idea----------------------
Mysql_error the efficiency of this library function is not high bar and the official is not recommended, I have almost no use of this, I learned PHP, directly started PDO.
Also, you use the keyword in this, as if it is not going to index it.
------to solve the idea----------------------
Citation:
feels almost the same
It should be your test environment problem, the table query in most cases is faster than in
The results you've measured are probably the following reasons.
1. The data of the query table does not reach a certain amount
2. Two different query methods isolate the result set
3. Query fields are not indexed or not added
4. Affected by too many other indexes
5. Other reasons
I think the obvious is the 2nd reason.
You did not specify a query field when you linked the table, the field must be more than the table is not connected
There will be one-to-many queries on the table, so that a record will become multiple, plus GROUP by