Detailed analysis of Android virtual machine ART Run-time Library

Last Update:2017-01-13 Source: Internet

Author: User

Tags garbage collection

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

At the latest Google I/O conference, Google released information about the latest runtime libraries on Android. This is the Android RunTime (ART). ART will replace the Dalvik virtual machine and become the executing tool for Java code on the Android platform. Although there have been some news about art since the Android KitKat, it's mostly news-specific and lacks specifics on the technical details. This article attempts to synthesize the various existing messages, as well as the latest release of the Android L Preview version of ROM, to the art Run-time library for a detailed analysis.

Mobile platforms such as Ios,windows,tizen directly compile software to be different from native code that can run directly on a particular hardware platform. The software on the Android platform is first compiled into a generic "Byte-code" by the compiler, which is then converted to the cost of instruction execution on a specific mobile device.

In the more than 10 years since Android was born, Dalvik has implemented virtual machines from a very simple Java byte-code, gradually adding new features, satisfying applications ' performance requirements, and collaborating with hardware devices. This includes the instant compilers (Jit-compiler) introduced in the Android 2.2 release, as well as subsequent multithreaded support, and some other optimizations.

However, over the past two years, the development of Android's entire ecosystem to the needs of Android virtual machines, the current Dalvik virtual machine is not enough to develop. Dalvik was initially designed with a weak processor performance, a very limited memory space for mobile devices, and a 32-bit system. So Google started to build a new virtual machine to better face the future development trend. The performance of this virtual machine can be easily extended to current multi-core processors, or even future 8 cores, to support mass storage and to support large capacity memory. As a then, art appeared.

1 Architecture Introduction

First, art's primary design requirement is to be fully compatible with the Byte-code that runs on Dalvik, Dex (Dalvik executable). In this way, for programmers, there is no need to recompile existing programs, directly to the Dalvik and art virtual machines to run APK. The biggest change that art brings is the use of precompiled Technology (Ahead-of-time compile) to replace the Just-in-time compilation technology (Just-In-Time compile) in Dalvik. Before, the virtual machine needs to execute the bytecode compilation cost code every time the application executes, while in art the compilation is performed only once, and subsequent execution of the application can be done by directly executing the saved local code. Of course, this precompilation technique requires additional storage space to store the local code. It is because of the increasing storage space of mobile devices that this technology can be applied.

This precompilation technology makes many of the previously impossible compiler optimization techniques available on the new Android platform. Because the code is only compiled and optimized once, it's worth spending more time on this compilation for more optimizations. Google says it is now possible to optimize the overall code technology of the application at a higher level because the compiler can now see the overall code for the application, and the compiler can only see and optimize a function in the application, or a very small part of the code, before Just-in-time compilation. With art, the overhead of exception checking in code can be largely avoided, and calls to methods and interfaces are much faster. This part of the functionality is accomplished by the newly added "Dex2oat" component, which replaces the corresponding "dexopt" component in Dalvik. The Odex file (the optimized Dex) file in Dalvik is also replaced with an elf file in art.

Because art is currently compiling an elf executable, the kernel can page-manage the code that is loaded into memory directly, resulting in more efficient memory management and less memory footprint. In this case, I am very curious as to what effect the KSM in the kernel (Kernel same-page merging) will have in art, and it should bring good results. We wait and see.

The impact of art on the duration of the endurance is also very significant. Because you no longer need to interpret execution, JIT also does not have to work when the program is running, which can save the number of instructions the CPU needs to execute, thus reducing power consumption.

Because of the introduction of more analysis and optimization during precompilation, the time to compile is longer, which is a side effect that art can bring. So compared to the Dalvik virtual machine, it takes longer to start the device first and the first time the application is installed. Google claims that this increase in time is not so scary. They hope and expect that the time for art to complete the above action will be about the same as the current Dalvik, or even shorter.

The above illustration shows that the performance improvement brought by art is very obvious. For the same code, performance increases about twice times. Google said that when the Android L was finally released, the expected performance boost would be as high as chessbench, with 3x acceleration.

2 garbage collection: Theory and Practice

Android virtual machines rely on automated memory management mechanisms, which are automated garbage collection. The cornerstone of this Java language programming model is also a very important part of the Android system from the day it was born. Here to explain to friends who do not know the concept of garbage collection, the so-called automatic garbage collection, that is, programmers in the programming process, do not need to be responsible for the storage of physical memory allocation and release. Just use a fixed pattern to create the variables or objects you need, and then use the variable or object directly. The running environment of the program automatically allocates the appropriate memory space in memory to store the variable or object, and automatically frees the allocated memory after the variable or object expires. This is the biggest difference with other lower-level languages that require manual storage management. The benefit of automated garbage collection is that programmers do not have to worry about memory management in programming, and of course, there is a cost, and that is that programmers cannot control when memory is allocated and released, and therefore cannot be optimized when needed (the Java language has programming interfaces for programmers to manually optimize programs, But the control mode and granularity are limited.

Android has been plagued by the Dalvik garbage collection mechanism for a long time. The Android platform's memory is generally small, and each time the application needs to allocate memory, the Dalvik garbage collector starts when the heap space (a chunk of the memory allocated to the application) does not provide such a large amount of space. The garbage collector traverses the entire heap space, looks at each application-allocated object, and marks all reachable objects (that is, the objects that will be used) and frees up objects that are not marked.

In a Dalvik virtual machine, the process that the garbage collector executes results in two application pauses:

The first is to traverse the heap address space phase,

The other is the marking phase.

The so-called pause, that is, the application all executing processes will be paused. If the pause time is too long, it will cause the application to render a drop frame phenomenon, resulting in the application of the cotton phenomenon, greatly reducing the user experience.

Google claims that the average length of this pause was 54ms on the Nexus 5 phone. This pause time will cause an average garbage collection to drop 4 frames when the application renders explicit.

My own experience and testing have shown that, depending on the application, the pause time may be much larger. For example, in a typical program of the official FIFA application, the garbage collection pause can be very severe.

07-01 15:56:14.275:D/DALVIKVM (30615): Gcforalloc freed 4442K, 25% free 20183k/26856k, paused 24ms, total 24ms

07-01 15:56:16.785:i/dalvikvm-heap (30615): Grow Heap (Frag case) to 38.179MB for 8294416-byte allocation

07-01 15:56:17.225:i/dalvikvm-heap (30615): Grow Heap (Frag case) to 48.279MB for 7361296-byte allocation

07-01 15:56:17.625:i/choreographer (30615): Skipped frames! The application may is doing too much work in its main thread.

07-01 15:56:19.035:D/DALVIKVM (30615): Gcconcurrent freed 35838K, 43% free 51351k/89052k, paused 3ms+5ms, total 106ms

07-01 15:56:19.035:D/DALVIKVM (30615): WAITFORCONCURRENTGC blocked 96ms

07-01 15:56:19.815:D/DALVIKVM (30615): Gcconcurrent freed 7078K, 42% free 52464k/89052k, paused 14ms+4ms, total 96ms

07-01 15:56:19.815:D/DALVIKVM (30615): WAITFORCONCURRENTGC blocked 74ms

07-01 15:56:20.035:i/choreographer (30615): Skipped frames! The application may is doing too much work in its main thread.

07-01 15:56:20.275:D/DALVIKVM (30615): Gcforalloc freed 4774K, 45% free 49801k/89052k, paused 168ms, total 168ms

07-01 15:56:20.295:i/dalvikvm-heap (30615): Grow Heap (Frag case) to 56.900MB for 4665616-byte allocation

07-01 15:56:21.315:D/DALVIKVM (30615): Gcforalloc freed 1359K, 42% free 55045k/93612k, paused 95ms, total 95ms

07-01 15:56:21.965:D/DALVIKVM (30615): Gcconcurrent freed 6376K, 40% free 56861k/93612k, paused 16ms+8ms, total 126ms

07-01 15:56:21.965:D/DALVIKVM (30615): WAITFORCONCURRENTGC blocked 111ms

07-01 15:56:21.965:D/DALVIKVM (30615): WAITFORCONCURRENTGC blocked 97ms

07-01 15:56:22.085:i/choreographer (30615): Skipped frames! The application may is doing too much work in its main thread.

07-01 15:56:22.195:D/DALVIKVM (30615): Gcforalloc freed 1539K, 40% free 56833k/93612k, paused 87ms, total 87ms

07-01 15:56:22.195:i/dalvikvm-heap (30615): Grow Heap (Frag case) to 60.588MB for 1331732-byte allocation

07-01 15:56:22.475:D/DALVIKVM (30615): Gcforalloc freed 308K, 39% free 59497k/96216k, paused 84ms, total 84ms

07-01 15:56:22.815:D/DALVIKVM (30615): Gcforalloc freed 287K, 38% free 60878k/97516k, paused 95ms, total 95ms

The log above was intercepted a few seconds after the FIFA application was run. The garbage collector was executed 9 times in a short span of 8 seconds, resulting in a total of 603ms applications and dropping frames up to 214 times. Most of the cotton comes from memory allocation requests, described in log "gc_for_alloc" tags.

Art has redesigned and implemented the entire garbage collection system. To make a comparison, the following logs are extracted in the same scenario using art to run the same application:

07-01 16:00:44.531:i/art (198): Explicit concurrent Mark Sweep GC freed (30KB) Allocspace objects, 0 (0B) LOS objects, 7 92% free, 18MB/21MB, paused 186us total 12.763ms

07-01 16:00:44.545:i/art (198): Explicit concurrent Mark Sweep GC freed 7 (240B) Allocspace objects, 0 (0B) LOS objects, 792 % free, 18MB/21MB, paused 198us total 9.465ms

07-01 16:00:44.554:i/art (198): Explicit concurrent Mark Sweep GC freed 5 (160B) Allocspace objects, 0 (0B) LOS objects, 792 % free, 18MB/21MB, paused 224us total 9.045ms

07-01 16:00:44.690:i/art (801): Explicit concurrent Mark Sweep GC freed 65595 (3MB) Allocspace objects, 9 (4MB) LOS objects, 810% free, 38MB/58MB, paused 1.195ms total 87.219ms

07-01 16:00:46.517:i/art (29197): Background partial concurrent mark sweep GC freed 74626 (3MB) Allocspace objects, (4MB) LOS objects, 1496% free, 25MB/32MB, paused 4.422ms total 1.371747s

07-01 16:00:48.534:i/choreographer (29197): Skipped frames! The application may is doing too much work in its main thread.

07-01 16:00:48.566:i/art (29197): Background sticky concurrent mark sweep GC freed 70319 (3MB) Allocspace objects, (5MB) LOS objects, 825% free, 49MB/56MB, paused 6.139ms total 52.868ms

07-01 16:00:49.282:i/choreographer (29197): Skipped frames! The application may is doing too much work in its main thread.

07-01 16:00:49.652:i/art (1287): Heap transition to processstatejankimperceptible took 45.636146ms saved at least 723KB

07-01 16:00:49.660:i/art (1256): Heap transition to processstatejankimperceptible took 52.650677ms saved at least 966KB

Art and Dalvik are very different, and new run-time memory management only pauses 12.364ms, runs 4 of front-end garbage collection, and 2 of background garbage collection. During the execution of the application, the heap space of the application did not increase, while the heap space in the Dalvik virtual machine increased 4 times. The number of dropped frames, the art virtual machine also dropped to 63 frames.

The example above is just one of the worst scenarios in an imperfect application. Because even in the art virtual machine, the application has lost a lot of frame rendering images. However, the above log contrast is still very useful, after all, the great programmer few, most of the Android program can not be developed perfectly. Android needs to be able to hold this situation.

Art takes some of the work that is usually done by the garbage collector and splits it into the application itself. Thus, the first pause that is introduced in the Dalvik because of the traversal heap space is completely eliminated. The second pause was greatly shortened by the application of a pre-cleaning technique (Packard pre-cleaning). With this technique, you just have to pause for a simple check and validation after the cleanup is complete. Google claims that they have managed to shorten such pauses to around 3ms, which is basically a majority reduction compared to the Dalvik virtual machine's garbage collector.

Art also introduces a special large object storage space (large object Space,los), which is separate from the heap space, but still resides in the application memory space. This particular design is designed to allow art to better manage larger objects, such as bitmap objects (bitmaps). This larger object poses some problems when the heap space is segmented. For example, when assigning an object of this type, the number of times the garbage collector is started is much higher than other normal objects. With the support of this large object storage space, the number of calls raised by the garbage collector due to heap space fragmentation is greatly reduced, so that the garbage collector can make more reasonable memory allocations, thereby reducing the run-time overhead.

A good example is when running a hangouts application, in the Dalvik virtual machine, we can see several pauses that result from allocating memory and running GC.

07-01 06:37:13.481:D/DALVIKVM (7403): Gcexplicit freed 2315K, 46% free 18483k/34016k, paused 3ms+4ms, total 40ms

07-01 06:37:13.901:D/DALVIKVM (9871): Gcconcurrent freed 3779K, 22% free 21193k/26856k, paused 3ms+3ms, total 36ms

07-01 06:37:14.041:D/DALVIKVM (9871): Gcforalloc freed 368K, 21% free 21451k/26856k, paused 25ms, total 25ms

07-01 06:37:14.041:i/dalvikvm-heap (9871): Grow Heap (Frag case) to 24.907MB for 147472-byte allocation

07-01 06:37:14.071:D/DALVIKVM (9871): Gcforalloc freed 4 K, 20% free 22167k/27596k, paused 25ms, total 25ms

07-01 06:37:14.111:D/DALVIKVM (9871): Gcforalloc freed 9K, 19% free 23892k/29372k, paused 27ms, total 28ms

We intercepted the above paragraph from all the garbage collection logs. The explicit (gc_explicit) and concurrency (gc_concurrent) are the more common cleanup and maintainability calls in the garbage collector. Gc_for_alloc is invoked when the memory allocator attempts to allocate a new memory space, but the heap space is not sufficient. In the log above, we can see that the heap space is expanded by the fragment operation, but still cannot load the large object. During the whole process of allocating large objects, the pause time is up to 90ms.

In contrast, the following log is extracted from the Android L Preview version of art run log.

07-01 06:35:19.718:i/art (10844): Heap transition to processstatejankperceptible took 17.989063ms saved at least-138kb

07-01 06:35:24.171:i/art (1256): Heap transition to processstatejankimperceptible took 42.936250ms saved at least 258KB

07-01 06:35:24.806:i/art (801): Explicit concurrent Mark Sweep GC freed 85790 (3MB) Allocspace objects, 4 (10MB) LOS objects , 850% free, 35MB/56MB, paused 961us total 83.110ms

We do not yet know what the "Heap Transition" in log means, but it can be guessed that the heap space size reset mechanism. After the application has been run, the only call to the garbage collector consumes 961us. We did not find any call to the garbage collector before this captured log. The interesting thing about this log is the Los Stats. To be able to see, there are 4 larger objects in Los, a total of 10MB. This piece of memory is not allocated in the heap space, otherwise there should be a similar dalvik hint.

Art's memory allocation system itself has been rewritten. Although art can bring about 25% performance improvements in memory allocation compared to Dalvik, Google is clearly not satisfied with this, so a new memory allocator is introduced to replace the "malloc" allocator currently in use.

This new memory allocator, "Rosalloc" (Runs-of-slots-allocator), is designed to be based on the characteristics of multithreaded Java applications. This memory allocator has a finer-grained locking mechanism to lock independent objects directly, rather than locking the entire memory space to be allocated. The allocation of small objects in the thread-local area can completely ignore the existence of the lock. Without the lock request and release, the speed of the thread-local small object's access is greatly increased.

This new memory allocator dramatically increases the speed of memory allocations and speeds up to 10x.

At the same time, art's garbage collection algorithm has also been improved to enhance the user experience, to avoid the application of the cotton. These algorithms are still being developed within Google. Recently, Google has only introduced a new algorithm, "moving garbage Collector". The core idea is that when the application runs in the background, the heap space of the program is merged into pieces.

3 64-bit support

Art was designed with the full consideration of the possibility of modularization of the various platforms that might run in the future. As a result, art provides a large number of compiler backend for generating current common architecture code, such as Arm,x86 and MIPS, including support for ARM64, x86-64, and MIPS64 support that has not yet been implemented.

For ARM's 64-bit system benefits, compared to a lot of friends know. Greater memory address space, pervasive performance improvements, and enhanced decryption capabilities and performance, plus compatibility with 32-bit applications.

In addition, Google has introduced a reference compression technique in art to avoid the memory footprint caused by the introduction of 64-bit pointers within the art heap space. In fact, in the execution, all pointers are 32-bit, not 64-bit systems should use 64-bit pointers.

Google exposes the performance comparisons of applications in 64-bit and 32-bit mode on arm and X86 platforms. This is just some preview of the nature of the data. X86 performance tests are performed on Intel's Baytrail systems, and performance increases range from 2x to 4.5x for different renderscript test programs. On the ARM platform, the performance of Crypto is compared with the A57 and A53 systems respectively. These data are not very representative because they are for very small examples, so they are not representative of the actual scenario.

However, Google has also released some interesting data that was tested on the system panorama they used internally. By simply converting from a 32-bit ABI to a 64-bit ABI, you can achieve a 13% to 19% performance boost. There is also a gratifying conclusion that ARM's cortex A53 in AArch64 mode can achieve more performance than A57 core.

Google also claims that 85% of apps in the App Store can run directly in 64-bit mode, meaning that only 15% of applications use local code to some extent and need to recompile the application for 64-bit platforms. This will be a very big advantage for Google. Next year, when most chip makers start pushing 64-bit systems, switching from 32-bit Android to 64-bit Android will be very fast.

4 Conclusion

In combination with the many aspects described above, art is Google's release of a performance boost, and art has solved many of the problems that plagued the Android system for years. ART effectively improves a number of issues that explain the execution of an application and provides an automated and efficient storage management system. For developers, many of the performance issues that used to be solved manually by adding code are now easily hold by art.

This also means that the Android system is finally able to match iOS in terms of system smoothness and application performance. For consumers, it is a joy big rush of things.

Google is still there and will be making great efforts to improve art in the coming years. Art's current situation is much different from what it was 6 months ago and is expected to change dramatically when Android L is actually released. The future is bright, let's wait and see, look forward to it.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More