Recently in doing a game statistics background, the most basic function is through the analysis of registered login log to display user data. Internal testing in the company, a small number of users, so there is no performance problems found. But these two days together in the real test environment, the user volume miso poured in, starting from the afternoon, the number of online statistics start card, a few seconds to return data, register the number of inquiries speed is OK. In the evening, statistics on the number of online people are basically loading timeout. Although I do not know what they are on the game side of the bug, the player over there often problems, resulting in online numbers and registration number is not a lot. But this point of data on my side of the query speed is not good, this is very embarrassing.
Now they are looking at the game bug, I also look at the statistical background of the code in the end where the performance. First of all, I statistical data is from the library, their game is the main library, and I have the number of administrators on this side, it is unlikely to affect the performance of the game clothing problem.
Today, the project leader has directed the database to the server within the company. I took a copy to the local machine to see where the performance problems of the statistical platform were. Then found that even registration statistics are very card, the server is about two seconds to return, the confidential more than 20 seconds, but also often timeout (php default configuration is 30 seconds timeout), the online statistics will not be open. Looked at the database, the day's registration record is about 3,500 (with false data), every five minutes, one day is counted 288 times. Of course, this is certainly not a circular query database 288 times, that would be scolded dead bar.
The number of registrations within the time period, the logic is also very simple, that is, each time period to traverse the data, compare the time size, in line with the 1. But why so simple logic, also 1 million cycles, how can run out for a full half a minute of time so long?
The key problem lies in the time comparison here, we all know, timestamp is compared to the size of a more scientific method, and the database recorded in the time is generally in the form of Yyyy-mm-dd HH:ii:ss, PHP has strtotime functions converted to timestamp. However, after 288 for * 3,500 foreach's Plus, the execution time here is up to half a minute.
$nowDayDT = strtotime (Date (' y-m-d '));
$__startt = Microtime (TRUE);
for ($i =0; $i < $allTime; $i + = $gapTime) {
$count = 0;
$startDT for data Comparisons
= $nowDayDT + $i;
$endDT = $nowDayDT + $i + $gapTime;
$xAxis 1 = date (' h:i ', $nowDayDT + $i) for display;
$xAxis 2 = date (' h:i ', $nowDayDT + $i + $gapTime);
foreach ($rawData as $line) {
$time = strtotime ($line [' Log_dt ']);
if ($startDT <= $time && $time < $endDT) {
$count + +;
}
}
$RESARR [] = [
' Date ' => $xAxis 1. ' ~ '. $xAxis 2,
' number ' => $count
];
}
echo microtime (TRUE)-$__startt;
So, basically there is no way to use this strtotime function, there is no way to compare the size of the time? The answer is very simple rough, PHP can directly compare two date time string! So the code after the change is as follows. And now the run time is about 0.3 seconds.
$__startt = Microtime (TRUE);
for ($i =0; $i < $allTime; $i + = $gapTime) {
$count = 0;
$startDT for data Comparisons
= Date (' y-m-d h:i:s ', $nowDayDT + $i);
$endDT = Date (' y-m-d h:i:s ', $nowDayDT + $i + $gapTime);
$xAxis 1 = date (' h:i ', $nowDayDT + $i) for display;
$xAxis 2 = date (' h:i ', $nowDayDT + $i + $gapTime);
foreach ($rawData as $line) {
$time = $line [' Log_dt '];
if ($startDT <= $time && $time < $endDT) {
$count + +;
}
}
$RESARR [] = [
' Date ' => $xAxis 1. ' ~ '. $xAxis 2,
' number ' => $count
];
}
echo microtime (TRUE)-$__startt;
Traversal optimization
You may find a problem, for the inside of a foreach nested, this performance is a bit worried, in which the foreach is necessary to fully traverse it? Actually, it's not necessary. Just check the SQL data, sorted by time. The optimized time comparison algorithm is as follows:
for{... foreach ($rawData as $line) {
$time = $line [' Log_dt '];//strtotime ($line [' Log_dt ']);
The optimization algorithm calculates
if ($time < $startDT) continue; Less than start time skips the
if ($time >= $endDT) break; is greater than the end time
$count + +; Otherwise, the eligible
//original algorithm
/ if ($startDT <= $time && $time < $endDT) {
// $count + +;
// }
}
...}
Here's a handy continue and break keyword for skipping a loop and ending the entire loop. This time, in the beginning of the day of the statistics, a large part of the data can be skipped directly. The final total traversal time is shortened to about 0.12 seconds.
In the large-scale data processing, we should try to avoid the conversion in the traversal, and avoid some functions with complex principle. such as Strtotime