Designing for Performance performance optimization

Source: Internet
Author: User
Tags integer division

Designing for Performance

Translator Signature: [email protected]

Translator Link: http://admires.iteye.com/

Version: Android 3.1 r1

Original

Http://developer.android.com/guide/practices/design/performance.html

Performance optimization

Android apps run on mobile devices that are limited by their computing power, storage space, and battery life. As a result, it must be efficient. Battery life can be a reason to optimize your program, even if he seems to be running fast enough. Because of the importance of endurance to the user, when the power consumption spikes, it means that the user will sooner or later find out because of your program.

Although this document contains subtle optimizations, these must not be the key to your software success. Choosing the right algorithms and data structures is always the first thing you should consider, but beyond this document.

Brief introduction

There are two basic principles for writing efficient code:

L do not make unnecessary work.

L try to avoid memory allocation.

Smart optimization

This document is a fine tuning of the Android specification, so make sure you know what code needs to be optimized and know how to measure the effects (good or bad) of your changes. The time to invest in development is limited, so smart time planning is important.

(see Summary For more analysis and notes.) )

This document also ensures that you make the best choice on algorithms and data structures, while considering the potential impact of API selection. Using the right data structures and algorithms is more valuable than any of the recommendations here, and prioritizing the impact of the API version will help you find a better implementation. (This is more important in the class library code than the application code)

(If you need such advice, see Josh Bloch ' s effective Java, item 47.)

One of the tricky things to do when optimizing your Android program is to keep your programs running on different hardware platforms. The version of the virtual machine is the same as the processor, so the speed at which it runs is very different. But this is not as simple as a faster or slower than B, and can be arranged between devices. In particular, the simulator can only measure a small part of the device reflected on the things. There is a huge difference between the devices with or without the JIT, the good code of the JIT device sometimes does not perform well on the device without JIT.

If you want to know the specific performance of a program on the device, you must test it on it.

Avoid creating unnecessary objects

Object creation is never free. The generational GC for each thread assigns an address pool to the zero object to reduce the allocation overhead, but often the memory allocation is more expensive than the non-allocation requirement.

If an object is allocated within the user interface cycle, a periodic garbage collection is enforced, adding a small pause gap to the user experience. The concurrent recycling mentioned in gingerbread may be useful, but unnecessary work should be avoided.

Therefore, unnecessary object creation should be avoided. Here are a few examples:

If there is a method that returns a string, and his return value is often appended to a stringbuffer, change the Declaration and implementation so that the function is appended directly to it, rather than creating a short-lived zero variable.

When reading data from the input data collection, consider returning the substring of the original data, rather than creating a new copy. So you create a new object, but they share the data in a char array. (The result is that even if only part of the original input is used, you need to make sure that its whole remains in memory.) )

A more thorough scenario is to cut a multidimensional array into a parallel one-dimensional array:

An array of type int is often of the remainder integer type. By and by far, two parallel int arrays are more efficient than an array of (Int,int) objects. This is common for any combination of other basic data types.

L If you need to implement a container to hold tuples (Foo,bar), two parallel arrays foo[],bar[] will be better than an array of (Foo,bar) objects. (The exception is when you design the API to call other code, apply a good API design in exchange for a small speed boost.) But in your own internal code, try to implement it efficiently. )

In general, try to avoid creating short-time zero objects. Less object creation means low-frequency garbage collection. This has a direct impact on the user experience.

The Mystery of performance

The previous version of the document gives a lot of misleading ideas, here are some clarifications:

On devices that do not have a JIT, it is more efficient to invoke a method to pass an object with a specific type rather than an interface type (for example, passing a hashmap map is less expensive than calling a method on a map map. Although two maps are HashMap), this is not a twice-fold situation, in fact, they are only 6%, and when there is JIT, the two calls are equally efficient.

On devices that do not have a JIT, the cached field access is approximately 20% faster than direct access. In the case of JIT, the cost of field access is equivalent to local access, so it's not worth optimizing, unless you think he'll make your code easier to read (the same applies to final, static, and static final variables)

Use static instead of virtual

If you do not need to access the fields of an object, set the method to static, and the call accelerates from 15% to 20%. This is also a good practice, as you can see from the method declaration that calling the method does not need to update the state of this object.

avoid the internal getters/setters

In a source language like C + +, it is common practice to use getters (I=getcount ()) Instead of direct field access (I=mcount). This is a good practice in C + + because the compiler will inline these accesses, and if you need to constrain or debug access to those domains, you can add code at any time.

In Android, this is not a good practice. Virtual method calls are much more expensive than direct field access. Generally, it makes sense to use getters and setters in common interfaces based on the practice of object-oriented languages, but it is advisable to have direct access in a class that is frequently accessed by a field.

With no JIT, direct field access is about 3 times times faster than calling getter access. When there is JIT (Direct Access field cost is equivalent to local variable access), it is 7 times times faster. This is true in the Froyo version, but later releases may improve the inline of the Getter method in the JIT.

Use for Constants Static Final modifier

Consider the following declaration of the first class:

The compiler generates a class initialization method <clinit>, which is executed when the class is first used, and this method stores 42 in Intval and obtains a reference to the class file string constant Strval. When these values are referenced later, they are accessed through the field lookup.

We improved the implementation by using the final keyword:

Class no longer requires the <clinit> method because constants enter the Dex file through a static field initializer. The code that references intval will call the shaping value 42 directly, while access to Strval will also take a relatively inexpensive "string constant" (Original: "Sring constant") directive instead of field lookups. (This optimization is only for basic data types and string-type constants, not arbitrary reference types.) However, it is a good practice to declare constants as static final whenever possible.

using the improved for Looping Syntax

The improved for loop (sometimes referred to as the "For-each" loop) can be used to implement a collection class and an array of iterable interfaces. In the collection class, iterators let the interface invoke the Hasnext () and Next () methods. In ArrayList, handwritten counting loops iterate 3 times times faster (with or without JIT), but in other collection classes, the improved for loop syntax and iterators have the same efficiency.

Here are some implementations of the array of iterations:

Zero () is the slowest, because the JIT does not optimize the cost of getting the array length for successive iterations in this traversal.

One () is slightly faster, putting everything into local variables, avoiding lookups. But only declaring the array length is good for performance improvement.

The one () is the fastest on a device that has no JIT, and the One () is up and down for a JIT device. He used the improved for loop syntax in JDK1.5.

Conclusion: The improved for loop is preferred, but in the performance-demanding ArrayList iterations, a handwritten count loop is considered.

(See effective Java Item 46.)

Within the private, consider replacing private access with package access rights

Consider the following definition:

The key to note is that we define a private inner class (Foo$inner) that accesses a private method and a private variable directly in the outer class. This is legal and the code will print out the expected "Value is 27".

The problem is that the virtual machine considers it illegal to access the private members of Foo directly from Foo$inner because they are two different classes, although the Java language allows internal classes to access private members of external classes, but generates several synthetic methods to bridge these gaps through the compiler.

The inner classes call these static methods in any external class where they need to access the Mvalue field or call the Dostuff method. This means that the code represents the direct access member variable as accessed through the accessor method. Before mentioning how accessor access is slower than direct access, this example shows that some language appointments are set to cause an invisibility performance issue.

If you use this code in a high-performance hotspot, you can access the package by declaring the fields and members that are accessed by the inner class, rather than the private. But this also means that these fields will be accessed by other classes in the same package, so it is not appropriate to use them in a public API.

Rational use of floating-point numbers

The common experience is that in Android devices, floating-point numbers are twice times slower than integers, which is true in Nexus One with FPU and JIT on G1 that lack the FPU and JIT (approximately 10 times times the absolute speed difference between two devices for arithmetic operations)

In terms of speed, there is no difference between a float and a double on modern hardware. More broadly speaking, a double is twice times larger. On desktops, a double has a higher priority than float because there is no space problem.

But even the integer type, some chips have hardware multiplication, but the lack of division. In this case, integer division and modulo operations are implemented by software, as if you were designing a hash table or doing a lot of arithmetic.

Understanding and using class libraries

Choose the code in the library instead of rewriting it yourself, except for the usual reasons, considering that the system is idle by using assembly code to replace the library method, which may be better than the best Java code generated in the JIT. The typical example is that String.indexof,dalvik is replaced with an internal inline. Similarly, the System.arraycopy method is 9 times times faster than the self-coding cycle on a JIT Nexus one.

(See effective Java Item 47.)

Rational use of local methods

Local methods are not necessarily more efficient than Java. At the very least, the correlation between Java and native transitions is consumed, and the JIT does not optimize this. When you allocate local resources (memory on the local heap, file descriptors, etc.), it is often difficult to reclaim these resources in real time. You also need to compile your code in a variety of structures (rather than relying on JIT). It may even be necessary to compile a different version for the same schema: the native code for the GI compilation of the ARM processor does not take full advantage of arm on Nexus One, and the native code compiled for arm on Nexus One cannot run on G1 arm.

Native code is especially useful when you want to deploy a program to an Android platform that has a local code base, not for the sake of accelerating Java applications.

(See effective Java Item 54.)

Conclusion

Finally: It is generally considered that there is a problem before the optimization. And you know the performance of your current system, otherwise you can't measure the elevation you're trying to get.

Each claim in this document is supported by a standard benchmark test. You can find the code for the benchmark in the code.google.com "Dalvik" project.

This standard benchmark is built on the caliper Java Standard Micro-benchmark framework. The standard micro-benchmark test is difficult to find the right way, so caliper to help you complete the difficult part of the work. And when you are aware of some of the test results of the situation and think of it (the virtual machine is always optimizing your code). We strongly recommend that you use Caliper to run your own standard micro-benchmark test.

You will also find that TraceView is useful for analysis, but it is important to understand that he does not currently support JIT, which could lead to the time-outs of code that can win on the JIT. Especially important, after making changes based on Taceview data, make sure that the code does run faster when there is no traceview.

Designing for Performance performance optimization

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.