Delphi code optimization

Source: Internet
Author: User
Tags case statement
Delphi code optimization
Come from: http://www.optimalcode.com
Document Cataloguing [hide]
1. String Optimization
1.1. Non-repeated Initialization
1.2. Use setlength to pre-allocate a long string (ansistring)
1.3. thread safety for strings and Dynamic Arrays)
1.4. Avoid using short strings
1.5. Avoid using the copy Function
1.6. Always use a long string and convert it to pchar if necessary
2. Integer code optimization
2.1. Try to use 32-bit Variables
2.2. Avoid using subinterface types
2.3. Simplified expression
2.4. no longer afraid of multiplication
2.5. Temporary subinterface type
2.6. Big Integer Operation
3. Floating Point Optimization
3.1. Be cautious with extended
3.2. Change FPU control word
3.3. Multi-Purpose round
3.4. transmit real parameters
3.5. Do it yourself.
3.6. Reduce Division
3.7. Floating Point Zero check
4. Other Optimizations
4.1. Local Variables
4.2. Local Process
4.3. Process Parameters
4.4. pointer variable
4.5. Array
4.6. Process Control
4.7. Forced type conversion
4.8. enumeration, Set
4.9. New problems brought about by Pentium II
4.10. CPU View
4.11. Cyclic statements
4.12. Case statement
4.13. Fill and move memory
4.14. interfaces and Virtual Methods
4.15. Code alignment
4.16. Code style
4.17. Trust the Compiler
4.18. Code timing
4.19. written at the end
 

String Optimization
Delphi has three types of strings: short strings (string [N], n = 1 .. 255) the storage area is statically allocated and the size is determined during compilation. This is a type inherited from BP for DOS. The character array (pchar) is mainly used to be compatible with various APIs, it has already appeared in bp7 and is now more widely used in Delphi. Its storage area can be statically allocated using character arrays or manually allocated using getmem, while the long string (ansistring) it is unique to Delphi and its storage areas are dynamically allocated during operation, which is the most flexible and abuse-prone.

Non-repeated initialization Delphi default string type ansistring will be automatically initialized to null. The following code:

Delphi has three types of strings: short strings (string [N], n = 1 .. 255) the storage area is statically allocated and the size is determined during compilation. This is a type inherited from BP for DOS. The character array (pchar) is mainly used to be compatible with various APIs, it has already appeared in bp7 and is now more widely used in Delphi. Its storage area can be statically allocated using character arrays or manually allocated using getmem, while the long string (ansistring) it is unique to Delphi and its storage areas are dynamically allocated during operation, which is the most flexible and abuse-prone.

VaR S: string;
Begin
S: = "";
......
End;

S: = ""; this is an alternative. However, it is worth noting that the result returned by the function is invalid. In general, passing a var argument is faster than returning a string value.

Using setlength pre-allocated long string (ansistring) to dynamically allocate memory is a long term of ansistring, but it is easy to be self-defeating. A typical example is as follows:

S2: = "";
For I: = 2 to length (S1) Do S2: = S2 + S1 [I];

Not to mention the use of Delete, the main problem is that the memory area of S2 in the previous example is repeatedly allocated, which is quite time-consuming. A simple and effective method is as follows:

Setlength (S2, length (S1)-1 );
For I: = 2 to length (S1) Do S2 [I-1]: = S1 [I];

In this way, the S2 memory is allocated only once.

Thread safety (thread safety): operations on dynamic arrays and long strings before Delphi 5 these non-thread safety calls are handled by reference counts to handle their critical issues, since delphi5, the lock command prefix is directly added before some critical commands to avoid this problem. Unfortunately, this modification is quite expensive because the lock command in the Pentium II processor is quite time-consuming, and it may take up to 28 additional instruction cycles to complete this operation, therefore, the overall efficiency is reduced by at least half. Only one solution to this problem is to modify the core code of Delphi RTL. After backing up the original file, replace all the locks in source/RTL/sys/system. Pas with {lock}, which must be replaced by the entire character. This is not fully optimized yet. The next step is to remove the xchg commands in the delphi4 Runtime Library. Because the commands have an implicit lock prefix, you must. replace [eax] with the following code in xchg edX during the _ lstrasg and _ strlasg processes in PAS:

MoV ECx, [eax]
MoV [eax], EDX
MoV edX, ECx

Okay. compile it and overwrite system. DCU. In this way, the execution efficiency will be 6 times higher than delphi5, and 2 times higher than delphi4.

Avoid using short strings because many string operations will first convert short strings to long strings, which slows down the execution speed. Therefore, it is better to use less short strings.

Avoiding using the copy function is also related to misuse of memory management. A typical scenario is as follows:

If copy (S1,) = copy (S2,) then ......

As a result, two temporary memories are allocated, reducing the efficiency. The following code should be replaced:
I: = 0;
F: = false;
Repeat
F: = S1 [I + 23] <> S2 [I + 15];
INC (I );
Until f or (I> 63 );
If not F then ......

Similarly, the following statements are quite inefficient:
S: = copy (s, 1, length (S)-10 );

Should be changed
Delete (S, length (S)-10, 10 );

By the way, when connecting strings, S: = S1 + S2; simple and valid; but in delphi2, S: = format ([% S % s], S1, s2); it may be faster.

Always use a long string and convert it to pchar if necessary. first look at the definition of ansistring:

Operations on dynamic arrays and long strings before Delphi 5 these non-thread-safe calls are handled by reference counts to solve their critical problems, since delphi5, the lock command prefix is directly added before some critical commands to avoid this problem. Unfortunately, this modification is quite expensive because the lock command in the Pentium II processor is quite time-consuming, and it may take up to 28 additional instruction cycles to complete this operation, therefore, the overall efficiency is reduced by at least half. Only one solution to this problem is to modify the core code of Delphi RTL. After backing up the original file, replace all the locks in source/RTL/sys/system. Pas with {lock}, which must be replaced by the entire character. This is not fully optimized yet. The next step is to remove the xchg commands in the delphi4 Runtime Library. Because the commands have an implicit lock prefix, you must. in. It should be replaced with the following code: Similarly, the following statement is quite inefficient: it should be replaced by the following: S: = S1 + S2 when connecting strings; simple and effective; but in delphi2, S: = format ([% S % s], S1, S2); may be a little faster.
Type
Ansistring = packed record
Allocsiz: longint; // dynamic allocation size
Refcnt: longint; // reference count
Length: longint; // the actual length.
Chrarr: array [1 .. allocsiz-6] of char; // byte sequence
End;

Astring [1] will return the content of astring. chrarr [1. Many people think that ansistring is inherently inefficient. In fact, this is largely due to poor code writing, memory management, and a lack of supported functions. As mentioned above, once a piece of memory is dynamically allocated, the long string becomes a linear byte sequence with no efficiency issues. Of course, it would be better if more effective functions are supported. There are three methods to convert ansistring to pchar:

1. P: = @ s [1]; this will trigger the uniquestring call.
2. P: = pchar (s); This checks whether s is null. If yes, Nil is returned; otherwise, the address of S [1] is returned.
3. P: = pointer (s); this does not cause any implicit call, so it is the best choice when determining that S is not null.

Integer code optimization

Try to use a 32-bit variable. In 32-bit code, the 32-bit variable is the default processing format. The operation of the 16-bit variable word character int widechar will temporarily switch the processor to the 16-bit processing mode, which requires double processing time. in contrast, the 8-bit variable byte char is not too slow as long as it is not mixed with other variables. If you need to use an 8 or 16-bit variable multiple times, you can consider converting it to a 32-bit variable temporarily. you need to assign a value to adword in one step: = aword;

One advantage of avoiding the use of sub-interface Pascal is that its rich data type Delphi Object Pascal inherits this traditional enumeration and sub-interface type, that is, this class, but unfortunately they will optimization is troublesome because the number of bytes they occupy depends on the size of their subdomains. For example, if the number of elements cannot exceed 256, the enumeration type occupies 1 byte, for example, myyear = 1900 .. 2000 occupies two bytes, and the 16-bit variable described above is very slow.

A simplified expression that is too complex may impede the compiler's automatic optimization. In this case, you can consider introducing temporary variables to simplify the expression. This can optimize the code and improve the readability of the Code.

There is no fear of multiplication Pii. Previously, multiplication was quite time-consuming, so that the classic optimization method at that time was to convert a special type of multiplication into a shift operation and addition method. Currently, the PII superior method and most similar to other operations, only one instruction cycle is required to complete. Of course, the Delphi compiler will still optimize operations such as * 2 to SHL 1. Isn't that bad?

The temporary subinterface type is used to expose the short subinterface type. In addition, its usage is similar to the following statement if (x> = 0) and (x

10) or (x> = 20) and (x
30) then... it can be rewritten as if X in [0 .. 10, 20 .. 30] then... the more sub-boundaries, the more obvious the optimization effect, however, there is no free pie in the world. The price of this time is to use a temporary register movzx and XOR/mov. This is two different methods for reading data smaller than 32 bits. The latter is more advantageous before PII, while the former is PII is more efficient due to its disorderly execution characteristics. The Compiler seems to have complicated trade-off rules. If necessary, it is better to use embedded assembly.

There are four weapons for dealing with 32-bit big integers. Why not seven? Ask Borland. Don't ask me -- int64 comp double and extended. Except 64-bit integer int64, the rest are floating-point numbers. The operation is implemented by the FPU command. The storage structure of the comp type is the same as that of the int64 command. exactly the same as Borland's official statement, the comp type should be replaced by int64 for a simple reason-integer operations are always faster than floating point operations. However, according to a test conducted on PII, int64 has the incomparable advantage is that the external multiplication and division is slower than the floating point number. Fortunately, the powerful comp is a little complicated. First, declare the variable as int64 and declare two auxiliary variables.

VaR
A, B, C, D, E: int64;
CA: comp absolute;
Cc: comp absolute C;
// Add and subtract operations are processed as follows without division change
C: = trunc (CA/B); // is faster than C: = a div B
Multiplication
E: = round (Ca * B + CC * D); // is faster than E: = a * B + C * D;

Floating Point Optimization

Be alert that extendedextended is large (10 bytes, 12 bytes if code alignment exists), and read/write operations are slow, making it an enemy of optimization. And there is a bug in Delphi2-4 code alignment for extended. Therefore, do not use extended unless necessary.

In addition, in mixed floating-point operations, the compiler stores temporary variables in extended type to avoid mixed floating-point operations in order not to lose precision.

Also, constants defined by const are also extended by default if not specified. The solution is to define the specified type constant (typed Constand) with the $ J indicator ).

By changing the default FPU control word, FPU control words make Division operations and square root operations on PII/piII slow and accurate. When such a result is not required, set8087cw can be used to make FPU "lazy ".

For single type: set8087cw (default8087cw and $ fcff)

For double type: set8087cw (default8087cw and $ fcff) or $0200)

For extended type: set8087cw (default8087cw or $0300)

Multi-purpose roundtrunc reads and writes FPU commands, but round does not. If you can, try to use round.

For a function that returns a floating point value, the portal and exit will have an additional pressure stack rollback, such:

Function func (X: sometype): somefloat;

Rewrite it as follows:
Procedure func (X: sometype; var FP: somefloat );

You do not need to use const to modify unmodified floating-point parameters during the process, because it is useless except to add a compilation check. The corresponding countermeasure is to use VaR to modify the parameter as an actual parameter and forcibly transfer the address.

By yourself, Delphi does not optimize floating-point operations. Therefore, you have to compile your own solutions.

It is worth noting that the trigger of a floating point exception in Delphi is not after an error, but before the next floating point instruction. Therefore, the common practice is to add an fwait command after a floating point operation is completed.

Reducing division, that is, multiple subtraction, is expensive. Therefore, it is necessary to reduce the number of Division times.

In addition, for simple division (such as A/5), the compiler is not necessarily (?!) It is changed to multiplication (A * 0.2), for example:

FP: = FP * 3*4/5 + 3*4/2;
In Delphi 4, it is compiled:

FP: = FP * 3*4/5 + 6;
And only:

FP: = 3*4/5 * FP + 3*4/2;
Will be compiled:

FP: = 2.4 * FP + 6;

In view of the complex rules of the compiler, we recommend that you optimize this step.

The floating point zero check checks whether a floating point number is zero. If the simple "afloat = 0", the 0 is converted to the floating point zero. The better solution is as follows:

For single type:

(DWORD (pointer (asingle) SHL 1) = 0
For the double type:

Type

Doubledata = record Lo, hi: DWORD end;

VaR

Adouble: Double;

DD: doubledata absolute Adouble;

Begin

...

If (DD. Hi SHL 1) + dd. Lo) = 0 then...

End;

This method improves the PII efficiency by 30%-40%.

Other Optimizations

The difference between local variables and C is That Delphi has no indicator similar to register and cannot explicitly define a register variable, because the Delphi compiler has made this step intelligent. Some local variables are automatically used as register variables. Of course, Delphi has its own internal standards. Generally, many referenced variables can always be optimized. Global variables do not have this benefit. Of course, there are also exceptions. An array with simple variables as elements can save a register as a global variable, "stack variables" such as strings, dynamic arrays, and objects do not necessarily localize them. (They are called "stack variables" because, as local variables, they only store a pointer in the stack, pointing to the storage area allocated in the heap, this requires additional entry and exit code. Borland's official explanation is that the stack is faster than the stack .)

This is a unique syntax of Delphi. However, calling a local process brings about additional stack operations so that the variables in the parent process can be accessed in a local process. Therefore, it is necessary to remove the local process and then use parameters to pass the required variables.

The default call convention in the process parameter Delphi is register. In this mode, eax, ECx, and EDX can be used to pass parameters, so the process parameters are generally no more than three. In object type methods, we recommend that you set no more than two parameters because of the implicit self pointer.

Pointer variable pointer is a very useful stuff. Java is discarded and C # is replayed. In Delphi, the pointer is 4 bytes in size and can also be register. Sometimes we can "imply" that the compiler does this by using the with clause, for example:

With somestructure. somevar [I] Do // some variables are classes or structures

Begin

...

End;

In this way, it will not be optimized.

30) then... it can be rewritten as if X in [0 .. 10, 20 .. 30] then... the more sub-boundaries, the more obvious the optimization effect, however, there is no free pie in the world. The price of this time is to use a temporary register movzx and XOR/mov. This is two different methods for reading data smaller than 32 bits. The latter is more advantageous before PII, while the former is PII is more efficient because of its disorderly execution characteristics. The Compiler seems complicated and necessary to choose the rules. If you have embedded the assembly, you may rewrite the rules as follows: you do not need to use const to modify unmodified floating-point parameters during the process, because it is useless except to add a compilation check. The corresponding countermeasure is to use VaR to modify the parameter as an actual parameter and forcibly transfer the address. In view of the complex rules of the compiler, we recommend that you optimize this step. This method improves the PII efficiency by 30%-40%. Unlike C, Delphi does not have a register-like indicator and cannot explicitly define a register variable, because the Delphi compiler has made this step intelligent. Some local variables are automatically used as register variables. Of course, Delphi has its own internal standards. Generally, many referenced variables can always be optimized. Global variables do not have this benefit. Of course, there are also exceptions. An array with simple variables as elements can save a register as a global variable, "stack variables" such as strings, dynamic arrays, and objects do not necessarily localize them. (They are called "stack variables" because, as local variables, they only store a pointer in the stack, pointing to the storage area allocated in the heap, this requires additional entry and exit code. Borland's official explanation is that the stack is faster than the stack .) In this way, somestructure. somevar [I], which will not be optimized, will be registered.

Arrays have greatly improved their dynamic arrays and multiplication capabilities since Pii. In addition to the fact that linked lists appear in textbooks, they are rarely used in actual programming, arrays are indeed much faster than traditional linked lists.

In Delphi, the array type has a static array (var a: array [0 .. 9] of byte), dynamic array (var a: array of byte), pointer array (pointer to static array), and open array (for parameter transfer only ). Static arrays and pointer arrays have the advantage of fast speed, and dynamic arrays have the advantage of variable size. The trade-off is that the defined dynamic array is converted to a pointer when necessary.

It is worth noting that dynamic arrays without const or var will be passed as form parameters, changing dynamic arrays with const does not mean that you cannot modify the elements in the array (do not believe that you add a [1]: = 0 in the above example; the compiler will not report an error ). In the preceding example, length (a) is used because high calls length.

For structured programs, break, continue, and exit are not widely promoted, but the code they produce is the most concise, so they still have a place in programming.

Delphi introduced the concept of exception, which should be said to be a major improvement of Object Pascal. However, exception capture is based on the addition of additional code. A try block is nested out of a few codes or exception capture is used inside the loop, which does not affect the efficiency. In addition, it is not a good habit to discard exceptions without handling them.

Forced type conversion many people prefer to use absolute for type conversion, but this will prevent this variable from becoming a register variable. Therefore, using type conversion in the process is a better choice.

Enumeration, set for the collection type, increase or decrease a single element with include, exclude ratio S: = S + [a]; fast, this does not need to say more.

In addition, you can use the {$ Zn} indicator to define the size of the enumeration type. defining it as four bytes of {$ Z4} may be faster.

The most unusual feature of Pentium II is its ability to run in excess of capacity, multi-channel, out-of-order mode. "Multi-Channel" means that the CPU has three loading channels (two of which can only load simple commands) five execution channels (one for integer operation, one for integer and floating-point operations, one for address operation, and two for data access) and three unload channels; "out-of-order execution" allows commands that do not affect each other to be executed simultaneously in the same clock cycle and in different channels. The impact on code execution is that some commands need to execute one or two clock cycles (such as continuous floating point operations), some do not require additional execution cycles because of parallelism (such as the jump after computing ). The above is just an overview. For more details, refer to the dedicated Pentium optimization guide and Intel documentation.

The CPU view in delphi32 ide has the CPU view (you can open it by modifying the registry key in delphi2 and 3). Check the corresponding Assembly source code during debugging to understand the code optimization, even accurate calculation of the required clock cycle (if you have enough) is quite effective.

Loop statement Delphi has its own unique and effective method for compiling loop statements, and it works well in most cases, but sometimes it also needs some other tricks, for example, use a while structure that is closer to the "assembly nature" in a small loop. In addition, for compact loops that open them into non-circular code, it seems better to adapt to the tendency of branch prediction under Pii.

Example of an optimization cycle:

For I: = 1 to 40 do

Begin
If I = 20 then a [I]: = A [I] + 20 else a [I]: = A [I] + 10;
End

Rewrite:
For I: = 1 to 19 do a [I]: = A [I] + 10;

A [20]: = A [20] + 20;

For I: = 21 to 40 do a [I]: = A [I] + 10;

Increases the amount of code, but reduces the number of judgments. Reducing the cycle condition judgment is also the key to growth.

When case statements have many sub-boundaries, you may want to divide them into several parts and then set another layer of case.

When one or two items are frequently used in the subfield of a case statement, you can place them before the case and use if to judge them.

When filling and moving a large amount of memory is filled and moved, it is best to write the Assembly by yourself, with 32-bit instructions. However, when using commands such as movsd and stosd, it is easy to encounter a problem: the data address or size (especially the latter) does not have dual-word alignment. What should I do? The answer is that there is an empty sub-drill. Most data is always aligned by double words when being distributed. For example, only DWORD alignment is considered. Of course, we recommend that you exercise caution in view of the potential risks and bugs caused by this practice.

Interfaces are the same as virtual methods of Object Pascal and Java. Multi-inheritance is not supported, but can be implemented using interfaces. But in Delphi, interface means double pointer.

To call a virtual method, you need to get the VMT pointer through the object pointer, and then obtain the method pointer from VMT. Therefore, you can use a work und to implement it if necessary.

Code alignment code alignment has the disadvantage of increasing the code size, but the benefits of its speed improvement make this sacrifice worthwhile, so it is generally recommended to open it.

Code style Pascal is a beautiful language (compared with C ++, It is a concise language-I do not mean to be honest here ). Personally, I am unwilling to undermine this kind of beauty for optimization. Fortunately, Delphi won't make me feel embarrassed. Rather, chaotic code will bring problems. Therefore, it is necessary to maintain a good code style.

I believe that Borland has the world's best Compiler (maybe better in your mind), which is not only fast, but also top-notch in compilation optimization. Therefore, in most cases, the natural code can achieve high efficiency. You don't have to worry about every piece of code, as long as the key part is fast enough.

Code timing is a very effective method in the code optimization process, and there are many software available in this area. Although it's not necessary to take a question, as some magazines have said, xxxmark. However, it is a great sense of accomplishment to quantify the actual improvement of your code efficiency.

At the end of the article, people tend to have a wonderful set of rules to deal with all the situations. Unfortunately, this article is not valid, and code optimization is the same. The most effective optimization is not over-algorithm optimization. Therefore, to allow programmers to keep an open mind and keep learning and practice is the best way to succeed.

Changed to: added the amount of code, but reduced the number of judgments. Reducing the cycle condition judgment is also the key to growth.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.