How to write high-performance Lua code _lua

Source: Internet
Author: User
Tags garbage collection key string lua polyline rehash sin

Objective

Lua is a scripting language known for its performance and is widely used in many ways, especially games. Like "World of Warcraft" plug-ins, mobile games "big Masters" "The Divine Comedy", "The Lost Land" and so on are written in Lua logic.

So most of the time we don't have to think about performance. Knuth has a famous saying: "Premature optimization is the root of all evils." This means that premature optimization is unnecessary, wastes a lot of time, and can easily lead to code confusion.

So a good programmer must ask himself two questions before considering optimizing performance: "Does my program really need to be optimized?" ”。 If the answer is yes, then ask yourself: "Which part of the optimization?" ”。

We cannot determine which part is optimized by conjecture and guesswork, and the efficiency of the code must be measurable. We need to use the analyzer to determine the bottleneck of performance, and then proceed to optimize. After optimization, we still have to use the analyzer to measure whether the optimization is really effective.

I think the best way to do this is to write high performance code in the best practice for the first time, rather than writing a bunch of garbage code and then thinking about optimization. Believe that after the work everyone will be after the optimization of the cumbersome are deeply experienced.

Once you decide to write high-performance Lua code, the following will indicate which code in Lua is optimized, which code is slow, and how to optimize it.

Use local

Before it runs, Lua compiles the source code into an intermediate code, similar to a Java virtual machine. This format is then explained through the interpreter of C, which is actually through a while loop, which has a lot of switch...case statements, a case corresponding to an instruction to parse.

Since LUA 5.0, LUA has adopted a virtual machine pattern similar to registers. LUA uses stacks to store its registers. Each active function, Lua allocates a stack, which is used to store the activity records in the function. Stacks of each function can store up to 250 registers, because the length of the stack is represented by 8 bits.

With so many registers, the LUA preprocessor can store all of the local variables in it. This makes LUA highly efficient when it comes to getting local variables.

Give me a chestnut: Suppose A and b are local variables, and the precompilation of a = a + B produces an instruction:

Copy Code code as follows:

; A is register 0 B is register 1
ADD 0 0 1

However, if neither a nor B is declared as a local variable, precompilation produces the following directives:

Copy Code code as follows:

Getglobal 0 0; Get a
Getglobal 1 1; get B
Add 0 0 1;d o add
Setglobal 0 0; Set a

So you know: When writing the LUA code, you should try to use local variables.

Here are a few comparison tests that you can copy into your editor and test.

Copy Code code as follows:

A = Os.clock ()
For i = 1,10000000 do
Local x = Math.sin (i)
End
b = Os.clock ()
Print (B-A)--1.113454

Assign Math.sin to the local variable sin:

Copy Code code as follows:

A = Os.clock ()
Local sin = Math.sin
For i = 1,10000000 do
Local x = sin (i)
End
b = Os.clock ()
Print (B-A)--0.75951

It takes 1.11 seconds to use Math.sin directly, and it takes 0.76 seconds to save Math.sin with the local variable sin. You can get a 30% efficiency boost!

About Tables (table)

Tables are used very frequently in Lua, because tables almost replace all of Lua's containers. So a quick look at how the LUA bottom is implementing the table is good for us to write LUA code.

The LUA table is divided into two parts: the array portion and the hash (hash) section. The array section contains all the integer keys from 1 to N, and all other keys are stored in the hash section.

The hash part is actually a hash table, the hash table is essentially an array, it uses the hash algorithm to transform the key into an array subscript, if the subscript has a conflict (that is, the same subscript corresponds to two different keys), then it will create a conflicting subscript on the list, the different key string on the list, This method of resolving conflicts is called a link-address method.

When we assign a new key value to a table, if the array and hash table are full, a hash (rehash) is triggered. The cost of a hash is high. The first is to allocate a new array of lengths in memory, and then all the records are hashed again, transferring the original records to the new array. The length of the Novi Hashi table is the closest to the 2 of the number of all elements.

When an empty table is created, the length of the array and the hash portion is initialized to 0, i.e. no arrays are initialized for them. Let's take a look at what happens in Lua in the execution of the following code:

Copy Code code as follows:

Local A = {}
For i=1,3 do
A[i] = True
End

Initially, LUA creates an empty table A, in the first iteration, a[1] = True triggers a Rehash,lua to set the length of the array part to 2^0, that is, 1, and the hash part is still empty. In the second iteration, a[2] = True triggers the rehash again, setting the length of the array part to 2^1, that is, 2. The last iteration, which triggers a rehash, sets the length of the array part to 2^2, or 4.

The following code:

Copy Code code as follows:

A = {}
a.x = 1; A.Y = 2; A.Z = 3

Similar to the previous code, it only triggers the rehash of the Greek portion of the table three times.

A table with only three elements executes three times rehash, whereas a table with 1 million elements only executes 20 rehash, because 2^20 = 1048576 > 1000000. However, if you create a very large number of very small tables (such as coordinate points: Point = {x=0,y=0}), this can have a huge impact.

If you have a lot of very small tables that need to be created, you can fill them up beforehand to avoid rehash. For example: {True,true,true},lua knows that this table has three elements, so LUA creates an array of three element lengths directly. Similarly, {x=1, y=2, Z=3},lua creates an array of length 4 in its hash section.

The following code execution time is 1.53 seconds:

Copy Code code as follows:

A = Os.clock ()
For i = 1,2000000 do
Local A = {}
A[1] = 1; A[2] = 2; A[3] = 3
End
b = Os.clock ()
Print (B-A)--1.528293

If we populate the table with its size, it only takes 0.75 seconds, one times more efficiency!

Copy Code code as follows:

A = Os.clock ()
For i = 1,2000000 do
Local A = {1,1,1}
A[1] = 1; A[2] = 2; A[3] = 3
End
b = Os.clock ()
Print (B-A)--0.746453

So, when you need to create a table with a very large number of small sizes, you should fill in the table size in advance.

About strings

Unlike other mainstream scripting languages, Lua differs in two ways in implementing string types.

First, all strings store only one copy in Lua. When the new string appears, Lua checks to see if it has the same copy, and if it does not create it, point to the copy. This can make string comparisons and table indexes quite fast because the comparison string only needs to check that the references are consistent; but it also reduces the efficiency of creating strings, because Lua needs to find comparisons.

Second, all string variables, only the string reference is saved, but not its buffer. This makes the assignment of strings very efficient. In Perl, for example, $x = $y, $y the whole of the buffer into the $x buffer, and when the string is very long, the cost of the operation will be very expensive. In Lua, the same assignment, which only copies references, is very efficient.

However, only saving references can reduce the speed at which strings are concatenated. In Perl, $s = $s. The efficiency gap between ' X ' and $s. = ' x ' is staggering. The former will get a copy of the entire $s and add ' x ' to its end, and the latter will insert ' X ' directly to the end of the $x buffer.

Since the latter does not need to be copied, its efficiency is independent of the length of the $s because it is highly efficient.

In Lua, the second, faster operation is not supported. The following code will take 6.65 seconds:

Copy Code code as follows:

A = Os.clock ()
Local s = '
For i = 1,300000 do
s = S.. A
End
b = Os.clock ()
Print (B-A)--6.649481

We can use table to simulate the buffer, the following code takes only 0.72 seconds, 9 times times more efficiency improvement:

Copy Code code as follows:

A = Os.clock ()
Local s = '
Local T = {}
For i = 1,300000 do
t[#t + 1] = ' a '
End
s = table.concat (T, ')
b = Os.clock ()
Print (B-A)--0.07178

So: In a large string connection, we should avoid ... Use table to simulate buffer, and then concat to get the final string.

3R principle

The 3R principle (the rules of 3R) is the abbreviation for the three principles of decrement (reducing), reuse (reusing) and recirculation (recycling).

The 3R principle is the principle of circular economy and environmental protection, but it also applies to Lua.

Reducing

There are many ways to avoid creating new objects and saving memory. For example, if you use too many tables in your program, you can think of a different data structure to represent.

Give me a chestnut. If you have a polygon in your program, you use a table to store the vertices of the polygon:

Copy Code code as follows:

Polyline = {
{x = 1.1, y = 2.9},
{x = 1.1, y = 3.7},
{x = 4.6, y = 5.2},
...
}

The above data structure is very natural, easy to understand. But each vertex needs a hash part to store. If placed in the array section, the memory footprint is reduced:

Copy Code code as follows:

Polyline = {
{1.1, 2.9},
{1.1, 3.7},
{4.6, 5.2},
...
}

1 million vertices, the memory will be reduced from 153.3MB to 107.6MB, but the cost is that the code is less readable.

The most abnormal way is:

Copy Code code as follows:

Polyline = {
x = {1.1, 1.1, 4.6, ...},
y = {2.9, 3.7, 5.2, ...}
}

1 million vertices, memory will only occupy 32MB, equivalent to 1/5 of the original. You need to make trade-offs between performance and code readability.

In the loop, we need to be more aware of the creation of the instance.

Copy Code code as follows:

For I=1,n do
Local T = {1,2,3, ' Hi '}
--Performs logic, but T does not change
...
End

We should put something unchanged in the loop into the loop outside create:

Copy Code code as follows:

Local T = {1,2,3, ' Hi '}
For I=1,n do
--Performs logic, but T does not change
...
End

Reusing

If you can't avoid creating new objects, we need to consider reusing old objects.

Consider the following code:

Copy Code code as follows:

Local T = {}
For i = 1970
T[i] = Os.time ({year = i, month = 6, day = 14})
End

In each iteration of the loop, a new table is created {year = i, month = 6, day = 14}, but only year is a variable.

The following code reuses the table:

Copy Code code as follows:

Local T = {}
Local aux = {year = nil, month = 6, day = 14}
For i = 1970
Aux.year = i;
T[i] = Os.time (aux)
End

Another way to reuse is to cache what was calculated before, to avoid subsequent duplication. Subsequent encounters with the same situation, you can directly check the table out. This method is actually the reason for the high efficiency of dynamic planning, its essence is to use space to change time.

Recycling

LUA has its own garbage collector, so we don't generally need to consider the problem of garbage collection.

Understanding the LUA garbage collection can make our programming more liberal.

The Lua garbage collector is a mechanism for incremental operation. That is, the recycle is broken down into many small steps (increments).

Frequent garbage collection can reduce the efficiency of your programs.

We can control the garbage collector through the LUA collectgarbage function.

The CollectGarbage function provides a number of features: Stop garbage collection, reboot garbage collection, force a recycle loop, force a one-step garbage collection, get the memory that Lua occupies, and two parameters that affect the frequency and pace of garbage collection.

Stopping garbage collection CollectGarbage ("Stop") is more efficient for a batch of LUA programs, because at the end of a batch program, the memory will all be freed.

It's really hard to generalize about the pace of the garbage collector. Faster garbage collection consumes more CPUs, but frees up more memory, which also reduces the CPU paging time. Only by careful experimentation do we know which way is more suitable.

Conclusion

We should write code in accordance with the high standards of writing, as far as possible to avoid the optimization after the event.

If there really is a performance problem, we need to quantify the efficiency of the tool, find the bottleneck, and then optimize for it. Of course after the optimization needs to be measured again to see if the optimization success.

In the optimization, we will face many choices: code readability and operational efficiency, CPU swap memory, memory swap CPU and so on. It is necessary to carry out continuous testing according to the actual situation to find the final equilibrium point.

Finally, there are two ultimate weapons:

First, using Luajit,luajit allows you to gain an average of about 5 times times the acceleration without modifying the code. View the performance improvement ratio of the Luajit under X86/x64.

Second, the bottleneck part is written with C + +. Because of the natural kinship of Lua and C, LUA and C can be mixed programming. But communication between C and Lua offsets some of the benefits of C.

Note: The two are not compatible, and the more LUA code you rewrite with C, the less optimized the Luajit will bring.

Statement

This article is based on the Lua language creator Roberto Ierusalimschy the Lua performance tips translation in the LUA programming Gems. This article has no literal translation, has done a lot of abridged, can be regarded as a note.

Thanks to Roberto's hard work and dedication in Lua!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.