Efficiency of code Execution
In the performance tuning strategy, I said, to tune the need to find the program in the hotspot, which is called the most places, such a place, as long as you can optimize a little bit, your performance will improve quality. Here I give you three examples of the efficiency of code execution (they all come from the Internet)
A first example
efficiency of getter and setter in PHP (source Reddit)
This example is relatively simple and you can skip it.
Consider the following PHP code: we can see that, using Getter/setter, performance is more than one-fold more than directly read-write member variables.
1234567891011121314151617181920212223242526272829303132333435 |
<?php
//dog_naive.php
class dog {
public $name =
""
;
public function setName(
$name
) {
$this
->name =
$name
;
}
public function getName() {
return $this
->name;
}
}
$rover =
new dog();
//通过Getter/Setter方式
for (
$x
=0;
$x
<10;
$x
++) {
$t = microtime(true);
for (
$i
=0;
$i
<1000000;
$i
++) {
$rover
->setName(
"rover"
);
$n =
$rover
->getName();
}
echo microtime(true) -
$t
;
echo "\n"
;
}
//直接存取变量方式
for (
$x
=0;
$x
<10;
$x
++) {
$t = microtime(true);
for
(
$i
=0;
$i
<1000000;
$i
++) {
$rover
->name =
"rover"
;
$n =
$rover
->name;
}
echo microtime(true) -
$t
;
echo "\n"
;
}
?>
|
This is not sparse, because there is the cost of function calls, function calls need to stack out the stack, need to pass the value, and sometimes need to interrupt, there are too many things to do. So, the code is much, the efficiency is naturally slow. All languages are this virtue, which is why C + + should introduce inline. And Java can be optimized when the optimization is turned on. But for dynamic languages, it becomes a bit difficult.
You might think that it would be better to use the following code (Magic Function), but it actually has worse performance.
123456789 |
class dog {
private $_name =
""
;
function __set(
$property
,
$value
) {
if
(
$property ==
‘name‘
)
$this
->_name =
$value
;
}
function __get(
$property
) {
if
(
$property ==
‘name‘
)
return $this
->_name;
}
}
|
The efficiency of dynamic language is always a problem, if you need PHP to have better performance, you may need to use Facebook's hiphop to compile PHP into C language.
A second example
Why do python programs execute faster inside a function? (source StackOverflow)
Consider the following code, one in the body of the function, and one in the global code.
Code execution efficiency within a function is 1.8s
1234 |
def main(): for i in xrange ( 10 * * 8 ): pass main() |
Code execution efficiency is 4.5s outside the function body
12 |
for i in xrange ( 10 * * 8 ): pass |
Without too much time, just an example, we can see a lot of efficiency. Why is that? We use the dis
bytecode code in the module disassembly function body, using the compile
Builtin disassembly global bytecode, we can see the following disassembly (note where I highlight)
Main function Disassembly
123 |
13 FOR_ITER 6 (to 22) 16 STORE_FAST 1 (i) 19 JUMP_ABSOLUTE 13 |
Global Code
123 |
13 FOR_ITER 6 (to 22) 16 STORE_NAME 1 (i) 19 JUMP_ABSOLUTE 13 |
As we can see, the difference is that the STORE_FAST
STORE_NAME,前者比后者快很多。所以,在全局代码中,变量i成了一个全局变量,而函数中的i是放在本地变量表中,所以在全局变量表中查找变量就慢很多。如果你在main函数中声明global i 那么效率也就下来了。
local variable is present in an array (until), accessed with an integer constant, and the global variable exists in a dictionary, and the query is slow.
(注:在
C + +, this is not a problem)
A third example
Why is the sequential data faster when traversing? (source StackOverflow)
See the code for C + + below:
1234567 |
for (unsigned i = 0; i < 100000; ++i) { //primary loop &NBSP;&NBSP;&NBSP;&NBSP; for (unsigned j = 0; j < arraySize; ++j) { if (Data[j] >=) sum + = Data[j]; &NBSP;&NBSP;&NBSP;&NBSP; } } |
If your data array is ordered, then performance is 1.93s, and if not sorted, the performance is 11.54 seconds. More than 5 times times worse. Either C/c++/java or any other language is basically the same.
The reason for this problem is-- branch Prediction (branch pre-award) Great StackOverflow gave a very good explanation.
Considering our railroad fork, when our train came, the Bandao knew where to split the fork, but did not know where the train was going, and the driver knew where to go, but did not know which fork to take. So, we need to stop the train, and then the driver and the Bandao to communicate. This is a poor performance.
So, we can optimize, that is to guess, we have at least 50% probability guess right, if guessed right, the train high performance, guess wrong, you have to let the train back. If I guess the probability is high, then, our performance will be high, otherwise always guess wrong, performance is very poor.
Image by Mecanismo, from Wikimedia commons:http://commons.wikimedia.org/wiki/file:entroncamento_do_transpraia.jpg
Our if-else is like this railroad fork, and the Red Arrows below refer to the moving-path device.
So, how do we pre-contract the runner? is to use past historical data, if the historical data has more than 90% of the left, then go to the left. So, it's easier to guess the right data in order.
ordered
123 4567 |
t = Walk Branch (conditional expression = true ) n = no branching (conditional expression false ) data[] = 0, 1, 2, 3, 4, ... 126, 127, 128, 129, 130, ... 251, 252, ... branch = n n n n n ... n n t ; t t ... T t t .... = nnnnnnnnnnnn ... Nnnnnnnttttttttt ... TTTTTTTTTT (easy to predict) |
Unsorted
1234 |
data[] = 226, 185, 125, 158, 198, 144, 217, 79, 202, 118, 14, 150, 177, 182, 133, ... branch = T, T, N, T, T, T, T, N, T, N, N, T, T, T, N ... = TTNTTTTNTNNTTTN ... (completely random - hard to predict) |
From the above we can see that the sorted data is easier to predict the branch.
So what are we going to do about it? We need to remove the If-else statement in this loop. Like what:
We put the conditional statement:
12 |
if (data[j] >= 128) sum += data[j]; |
Become:
12 |
int t = (data[j] - 128) >> 31; sum += ~t & data[j]; |
The "no fork" performance is basically the same as "orderly branching", whether it is a C/s, or Java.
Note: under GCC, if you use the -O3
or -ftree-vectorize
compile parameter, GCC will help you optimize the fork statement as no fork statement. VC++2010 does not have this feature.
Finally, we recommend a website--google speed, there are some tutorials on the website to show you how to write a faster Web program.
(End of full text)
Efficiency of code Execution