Write high-performance Lua code

Source: Internet
Author: User
Tags polyline rehash


Preface


Lua is a scripting language known for its performance, and is widely used in many ways, especially in games. Like "World of Warcraft" plug-ins, mobile games "big Masters" "The Divine Comedy" "Lost Land" and so are all written in Lua logic.



So most of the time we don't need to think about performance. Knuth has a famous saying: "Premature optimization is the root of all evils." It means that premature optimization is unnecessary, wastes a lot of time, and easily leads to code confusion.



So a good programmer has to ask himself two questions before considering optimizing performance: "Does my program really need to be optimized?" ”。 If the answer is yes, then ask yourself: "Which part is optimized?" ”。



We cannot decide which part to optimize by imagining and guessing, and the efficiency of the code must be measurable. We need to use analyzers to determine performance bottlenecks, and then proceed with optimization. After optimization, we still need to use the analyzer to measure whether the optimization is really effective.



I think the best way to do this is to write high-performance code in the best practice when you first write it, rather than write a bunch of junk code and then consider optimization. I believe that after the work we will be the optimization of the aftermath of the cumbersome are deep experience.



Once you decide to write high-performance LUA code, the following will indicate which code in Lua is optimized, what code is slow, and how to optimize them.


Using the Local


Lua pre-compiles the source code into an intermediate code, similar to a Java virtual machine, before it runs. This format will then be interpreted by the C interpreter, the whole process is actually through awhileloop, there are manyswitch...casestatements, acasecorresponding instruction to parse.



Since LUA 5.0, LUA has adopted a register-like virtual machine pattern. LUA uses stacks to store its registers. Each of the active functions, LUA allocates a stack that stores the activity records in the function. The stack of each function can store up to 250 registers, because the stack length is represented by 8 bits.



With so many registers, Lua's precompiled compiler can store all the local variables in it. This makes LUA highly efficient in acquiring local variables.



Give me a chestnut: assuming that A and B are local variables,a = a + bthe precompilation produces an instruction:


0 0 1


However, if both A and B are not declared as local variables, precompilation produces the following directives:


GETGLOBAL 0 0 ;get a
GETGLOBAL 1 1 ;get b
ADD 0 0 1 ;do add
SETGLOBAL 0 0 ;set a


so you know: When writing LUA code, you should try to use local variables .



Here are a few comparison tests where you can copy the code into your editor and test it.


a = os.clock() for i = 1,10000000 do local x = math.sin(i)
end
b = os.clock()
print(b-a) -- 1.113454


put math.sinassigning to local variables sin:


a = os.clock()
local sin = math.sin for i = 1,10000000 do local x = sin(i)
end
b = os.clock()
print(b-a) --0.75951


Direct Use math.sin, taking 1.11 seconds; using local variables sinto save math.sin, taking 0.76 seconds. Get a 30% efficiency boost!


About Tables (table)


Tables are used very frequently in Lua, because tables replace almost all of Lua's containers. So it's good to have a quick look at how the bottom of the LUA implementation table is for us to write LUA code.



The LUA table is divided into two parts: the array part and the hash section. The array part contains all the integer keys from 1 to N, and all other keys are stored in the hash section.



The hash part is actually a hash table, the hash table is essentially an array, it uses the hashing algorithm to convert the key to the subscript, if the subscript has a conflict (that is, the same subscript corresponds to two different keys), then it will be a conflicting subscript on the list, the different keys string on the list, This method of conflict resolution is called the chain address method.



When we assign a new key value to the table, if the array and hash table are full, a hash (rehash) is triggered. The cost of re-hashing is high. First, a new array of lengths is allocated in memory, then all the records are hashed again, and the original records are transferred to the new array. The length of the Novi Hashi table is the 2 that is closest to the number of all elements.



When an empty table is created, the length of the array and the hash parts are initialized to 0, which means that no array is initialized for them. Let's take a look at what happens in Lua when you execute this piece of code:


local a = {}
for i=1,3 do
    a[i] = true
end


Initially, LUA created an empty table A, in the first iteration, a[1] = truetriggered once Rehash,lua sets the length of the array part to 2^0, which is 1, the hash portion is still empty. In the second iteration, thea[2] = truerehash is triggered again, and the array part length is set to 2^12. The last iteration, which also triggers a rehash, sets the array part length to2^2, that is, 4.



The following code:


a = {}
a.x = 1; a.y = 2; a.z = 3


Similar to the previous code, except that it triggers the rehash of the hash portion of the three-time table.



A table with only three elements executes three times rehash, whereas a table with 1 million elements only executes 20 times rehash, because2^20 = 1048576 > 1000000. However, if you create a very large number of very small tables (such as coordinate points:point = {x=0,y=0}), this can have a huge impact.



If you have a lot of very small tables that need to be created, you can pre- populate them to avoid rehash. For example{true,true,true}, Lua knows that the table has three elements, so LUA directly creates an array of three element lengths. Similarly,{x=1, y=2, z=3}Lua creates an array of length 4 in its hash section.



The following code execution time is 1.53 seconds:


a = os.clock()
for i = 1,2000000 do
    local a = {}
    a[1] = 1; a[2] = 2; a[3] = 3
end
b = os.clock()
print(b-a)  --1.528293




If we fill in the size of the table when we create it, it only takes 0.75 seconds, one times the efficiency increase!


a = os.clock()
for i = 1,2000000 do
    local a = {1,1,1}
    a[1] = 1; a[2] = 2; a[3] = 3
end
b = os.clock()
print(b-a)  --0.746453


So, the size of the table should be prepopulated when you need to create a table with a very large number of small size .


About strings


Unlike other mainstream scripting languages, LUA differs in the implementation of string types in two ways.



First, all strings have only one copy stored in Lua. When a new string appears, Lua checks for the same copy, and if not, creates it, otherwise, points to the copy. This can make string comparisons and table indexes quite fast, because comparing strings only requires checking that the references are consistent, but it also reduces the efficiency of creating strings, because Lua needs to look for comparisons again.



Second, all the string variables, save only the string reference, not the buffer it is saved in. This makes the assignment of strings very efficient. In Perl, for example, the$x = $yentire buffer of $y is copied to the $x buffer, which can be costly when the string is long. And in Lua, the same assignment, only copy references, is very efficient.



But saving a reference only decreases the speed at which the string is concatenated. In Perl,$s = $s . ‘x‘$s .= ‘x‘The efficiency gap is staggering. The former will take a copy of the entire $s and add ' x ' to its end, while the latter will insert ' X ' directly at the end of buffer $x.



Since the latter does not need to be copied, its efficiency is independent of the length of the $s because it is highly efficient.



In Lua, the second faster operation is not supported. The following code will take 6.65 seconds:


 
a = os.clock()
local s = ‘‘
for i = 1,300000 do
    s = s .. ‘a‘
end
b = os.clock()
print(b-a)  --6.649481


We can use table to simulate buffer, the following code only takes 0.72 seconds, more than 9 times times the efficiency increase:


a = os.clock()
local s = ‘‘
local t = {}
for i = 1,300000 do
    t[#t + 1] = ‘a‘
end
s = table.concat( t, ‘‘)
b = os.clock()
print(b-a)  --0.07178


so: in a large string connection, we should avoid... Apply the table to simulate buffer, and then concat to get the final string .


3R principle


3R principle (the rules of 3R) is: reduction (reducing), re-use (reusing) and recycling (recycling) Three principles of abbreviation.



The 3R principle is the principle of circular economy and environmental protection, but the same applies to Lua.


Reducing


There are many ways to avoid creating new objects and saving memory. For example, if you use too many tables in your program, you might want to consider changing a data structure to represent it.



Give me a chestnut. Assuming that your program has a polygon type, you use a table to store the vertices of the polygon:


polyline = {
    { x = 1.1, y = 2.9 },
    { x = 1.1, y = 3.7 },
    { x = 4.6, y = 5.2 },
    ...
}


The above data structure is very natural and easy to understand. But each vertex needs a hash section to store it. If placed in the array section, memory consumption is reduced:


polyline = {
    { 1.1, 2.9 },
    { 1.1, 3.7 },
    { 4.6, 5.2 },
    ...
}


1 million vertices, the memory will be reduced from 153.3MB to 107.6MB, but the cost is that the readability of the code is reduced.



The most perverted way to do this is:


polyline = {
    x = {1.1, 1.1, 4.6, ...},
    y = {2.9, 3.7, 5.2, ...}
}


1 million vertices, the memory will only occupy 32MB, equivalent to the original 1/5. You need to make a choice between performance and code readability.



In the loop, we need to pay more attention to the creation of the instance.


For i=1, n do
local t = {1,2,3,‘hi‘}
--Executes logic, but does not change
...
End


We should put things that are not changed in the loop and create them in the loop:


 
 
local t = {1,2,3,‘hi‘}
for i=1,n do
    --
Executes logic, but does not change
... end


Reusing



If you can't avoid creating new objects, we need to consider reusing old objects.



Consider this piece of code:


 
 
local t = {}
for i = 1970, 2000 do
    t[i] = os.time({year = i, month = 6, day = 14})
end


in each iteration of the loop, a new table {year = i, month = 6, day = 14}is created , but only yearis a variable.



The following code reuses the table:


 
local t = {}
local aux = {year = nil, month = 6, day = 14}
for i = 1970, 2000 do
    aux.year = i;
    t[i] = os.time(aux)
end


another way to reuse is to cache the content that was previously computed to avoid subsequent repeated computations. If you encounter the same situation, you can check out the table directly. This approach is actually the reason why the dynamic planning is high efficiency, its essence is to use space to change time.


Recycling


Lua comes with a garbage collector, so we don't generally need to consider garbage collection.



Understanding the garbage collection of LUA allows us to program with greater degrees of freedom.



The Lua garbage collector is a mechanism for incremental operation. That is, recycling is done in many small steps (incremental).



Frequent garbage collection can reduce the operational efficiency of the program.



We can use Lua'scollectgarbagefunctions to control the garbage collector.



collectgarbageFunctions provide a number of functions: Stop garbage collection, restart garbage collection, force a recycle cycle, force a garbage collection, get the memory that Lua consumes, and two parameters that affect the garbage collection frequency and stride.



For a batch-processing LUA program, stopping garbage collection iscollectgarbage("stop")more efficient because the batch program ends with the memory being freed.



For the garbage collector's stride, it's actually hard to generalize. Faster garbage collection consumes more CPUs, but frees up more memory, which also reduces CPU paging time. Only by careful experimentation do we know which way is more appropriate.


Conclusion


We should write the code in accordance with high standards, and try to avoid optimization after the event.



If there is a real performance problem, we need tools to quantify efficiency, find bottlenecks, and optimize for them. Of course, the optimization needs to be measured again to see if the optimization is successful.



In the optimization, we face a lot of choices: code readability and operational efficiency, CPU swap memory, memory swap CPU and so on. It is necessary to conduct continuous testing according to the actual situation to find the final equilibrium point.



Finally, there are two ultimate weapons:



First, using Luajit,luajit allows you to get an average of about 5 times times faster without modifying your code. See the Luajit performance boost under X86/x64.



Second, the bottleneck part is written in C + +. Because of the innate kinship between Lua and C, LUA and C can be mixed programming. But the communication between C and Lua offsets some of the benefits of C.



Note: The two are not compatible, and the more LUA code you rewrite with C, the less optimization the Luajit will bring.


Statement


This article is based on the Lua language creator Roberto Ierusalimschy in Lua programming Gems's LUA performance tips translation. This article does not have literal translation, has done a lot of deletions, can be regarded as a note.



Thanks to Roberto for his hard work and dedication on Lua!



"Turn from" http://blog.jobbole.com/65991/



Write high-performance Lua code


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.