How to optimize your App's startup time

Last Update:2017-10-18 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

APP Run theory

What happened before main () was executed
Mach-o format
Virtual Memory Basics
Mach-o binary Loading

Theoretical crash

Mach-o terminology

Mach-o is the file type for different runtime executables.

File type: the

Executable: The primary binary of the application
Dylib: Dynamic link library (also known as DSO or DLL)
Bundles: Dylib that cannot be linked can only be loaded at runtime using Dlopen () and can be used as MacOS plugins.

Image:executable,dylib or Bundles
Framework: A folder containing Dylib and resource files and header files

Mach-o image File

Mach-o is divided into some segement, and each segement is divided into sections.

Segment names are uppercase, and the space size is the integer of the page. The size of the page is hardware-related, on the ARM64 schema one page is 16KB and the rest is 4KB.

section does not have an integer page size limit, but there is no overlap between sections.

Almost all mach-o contain these three segments (segment): __text,__data and __linkedit:

The __text contains the Mach header, the code that is executed, and the read-only constant (such as the C string). Read-only executable (r-x).
__data contains global variables, static variables, and so on. Readable and writable (rw-).
__linkedit contains the "metadata" of the loader, such as the name and address of the function. Read-only (r –).

Mach-o Universal File

FAT binaries, merging multiple schemas of mach-o files. It uses the FAT header to record the offset of the different schemas in the file, and the Fat header occupies one page of space.

Storing these segement and headers by paging can waste space, but this facilitates the implementation of virtual memory.

Virtual memory

Virtual memory is a layer of indirect addressing (indirection). There is a maxim in software engineering that any problem can be solved by adding an indirect layer. Virtual memory solves the problem of managing the use of physical RAM by all processes. By adding an indirection layer to allow each process to use the logical address space, it can be mapped to a physical page on RAM. This mapping is not one-to-one, the logical address may not be mapped to RAM, or there may be multiple logical addresses mapped to the same physical RAM. In the first case, the page fault is triggered when the process wants to store the logical address content, and the second case is multi-process shared memory.

The file can be read in a paging map (mmap ()) without having to read the entire file at once. That is, a fragment of a file is mapped to a page of process logical memory. When a page that you want to read is not in memory, it will trigger page fault, and the kernel will only read the page and implement lazy loading of the file.

This means that the __text segment in the Mach-o file can be mapped to multiple processes and can be lazily loaded, and the memory is shared between processes. The __data segment is readable and writable. Here the use of Copy-on-write technology, referred to as COW. When multiple processes share a single page of memory space, once a process is written, it copies the contents of the page memory and then re-maps the logical address to the new RAM page. That is, the process itself has a copy of that page of memory. This involves the concept of Clean/dirty page. The dirty page contains the process's own information, and the clean page can be regenerated by the kernel (reread the disk). So the cost of dirty page is greater than the clean page.

Mach-o Mirroring Loading

Therefore, when multiple processes load mach-o mirrors, __text and __linkedit can share memory because they are read-only. and __data because can read and write, will produce dirty page. When the Dyld executes, the __linkedit is useless and the corresponding memory pages are recycled.

Safety

ASLR (address space layout randomization): Location randomization, mirroring is loaded at random addresses. This is actually the old technology ten or twenty years ago.

Code Signing: Maybe we think Xcode will encrypt the entire file and make a digital signature. In fact, in order to verify the signature of the Mach-o file at run time, not every time the whole file is read repeatedly, it generates a separate cryptographic hash value for each page and is stored in __linkedit. This allows the contents of each page of the file to be verified and not tampered with in a timely manner.

From exec () to main ()

EXEC () is a system call. The system kernel maps the application to the new address space, and each start location is random (because of the use of ASLR). And the process permissions from the start position to 0x000000 are marked as unreadable and unreadable. If it is a 32-bit process, this range is at least 4KB, and at least 4GB for 64-bit processes. Both the NULL pointer reference and the pointer truncation error are captured by it.

Dyld Loading dylib files

Unix was at ease for the first 20 years because it had not yet invented a dynamic link library. With the dynamic link library, a helper to load the link library is created. In Apple's platform is Dyld, other Unix system also has ld.so. When the kernel finishes working on the mapping process, the Mach-o file named Dyld is mapped to a random address in the process, which sets the PC register to the Dyld address and runs. The work that Dyld runs in the app process is to load all the dynamic-link libraries that the app relies on, ready to run everything it needs, with the same permissions as the app.

The following steps make up the Dyld timeline:

Initializers, OBJC, Bind, Rebase, Load dylibs

Load Dylib

Gets the list of dependent dynamic libraries that need to be loaded from the header of the main execution file, and the header has already been mapped by the kernel. Then it needs to find each dylib and then open the file to read the file starting location and make sure it is the Mach-o file. The code signature is then found and registered to the kernel. The mmap () is then raised on each segment of the Dylib file. The Dylib file that the application relies on may be dependent on other dylib, so dyld needs to load a recursive collection of dynamic library lists. General applications load 100 to 400 dylib files, but most are system dylib, which are pre-computed and cached and loaded quickly.

Fix-ups

After all the dynamic-link libraries are loaded, they are only in separate states and need to be bound together, which is fix-ups. Code signing makes it impossible for us to modify the instructions so that one dylib can call another dylib. You need to add a lot of indirect layers.

The modern Code-gen is called Dynamic PIC (Position Independent code), which means that it can be loaded onto an indirect address. When the call occurs, Code-gen actually creates a pointer to the callee in the __data segment, and then loads the pointer and jumps past.

So what Dyld do is fix (fix-up) pointers and data. There are two types of fix-up, rebasing and binding.

Rebasing and Binding

rebasing: Adjusting the pointer's pointing inside the mirror
Binding: Pointing the pointer to content outside the mirror

Information such as rebase and bind can be viewed from the command line:

1	`xcrun dyldinfo -rebase -bind -lazy_bind myapp.app/myapp`

With this command, you can view all the fix-up. Rebase,bind,weak_bind,lazy_bind are stored in the __linkedit segment and can be lc_dyld_info_only to see the offsets and sizes of various information.

It is recommended to use Machoview to see more convenient and intuitive.

The process of rebasing and Binding is briefly introduced from the DYLD source level.

Imageloader is a base class for loading executables, which is responsible for link mirroring, but does not care about specific file formats, because these are given to subclasses to implement. Each executable file will correspond to a Imageloader instance. Imageloadermacho is the Imageloader subclass used to load Mach-o format files, and Imageloadermachoclassic and imageloadermachocompressed inherit from the Imageloadermacho, which are used to load mach-o files that are __linkedit in both traditional and compressed formats.

Because there is a dependency between the dylib, so many operations in Imageloader are recursive operations along the dependency chain, rebasing and Binding are no exception, respectively, corresponding to Recursiverebase () and Recursivebind () the two parties Method. Because it is recursive, the Dorebase () and Dobind () methods are called from the bottom up, so that the dependent dylib is always preceded by the dylib that relies on it to perform rebasing and Binding. The parameters passed into Dorebase () and Dobind () contain a linkcontext context, which stores a heap of state and related functions for the executable file.

The rebasing and Binding will determine whether the prebinding has been made before. If pre-binding (prebinding) is already in place, then the fix-up process of rebasing and binding is not required because the pre-bound address is already loaded.

There are five reasons why a Imageloadermacho instance does not use pre-binding:

Mach-o Header in Mh_prebound flag bit is 0
The image load address has an offset (as described later)
Changes to dependent libraries
Mirroring uses flat-namespace, a portion of the pre-binding is ignored
Linkcontext environment variables prohibit pre-binding

The things dorebase () do in Imageloadermacho are roughly as follows:

If pre-binding is used, the fgimageswithusedprebinding count is added one, and return; otherwise enter the second step
If the MH_PREBOUND flag bit is 1 (that is, it can be pre-bound but not used) and is mirrored in shared memory, resets all the lazy pointer in the context. (If the image is in shared memory, it will be bound later in the binding process, so there is no need to reset)
If the mirror load address offset is 0, you do not need to rebasing and return directly; otherwise enter the fourth step
Call the Rebase () method, which is the way to really do rebasing work. If the Text_reloc_support macro is turned on, the Rebase () method is allowed to write to the __text segment to fix-up it. So in fact __text read-only properties are not absolute.

Imageloadermachoclassic and imageloadermachocompressed respectively implemented their own dorebase () method. The implementation logic is much the same, the same will determine whether to use pre-binding, and in the real binding work to judge Text_reloc_support macro to decide whether to write to the __text segment. Finally, Setuplazypointerhandler is called to set Dyld's entry point in the mirror, and the last call is made to set the main executable to __dyld or __program_vars.

Rebasing

In the past, Dylib was loaded into the specified address, all pointers and data were right for the code, and Dyld did not have to do any fix-up. Now with ASLR regret to load the dylib to a new random address (actual_address), this random address with the code and data point to the old address (preferred_address) will be biased, dyld need to fix this deviation (slide), the method is to This offset is added to the pointer address inside the DYLIB, and the offset is calculated as follows:

Slide = actual_address-preferred_address

Then it repeats the constant __data of the rebase in the segment and the offset. This involves page fault and COW. This can cause I/O bottlenecks, but because the order of rebase is arranged by address, this is a sequential task from the kernel point of view, which reads the data in advance and reduces I/O consumption.

Binding

The binding is to handle pointers to external dylib, which are actually bound by the symbol name, which is a string. A pointer to bind is also stored in the __linkedit section, as well as the symbol that the pointer needs to point to. Dyld need to find the symbol corresponding to the implementation, which requires a lot of computation, go to the symbol table lookup. When found, the contents are stored in the pointer in the __data segment. The binding appears to be computationally larger than rebasing, but requires very little I/O operations because the rebasing has already been done for the binding.

OBJC Runtime

Many of the data structures in objective-c are fixed by rebasing and Binding (fix-up), such as pointers to the superclass in class and pointers to methods.

OBJC is a dynamic language that can instantiate an object of a class with the name of the class. This means that the OBJC Runtime needs to maintain a global table of mapped class names and classes. When a dylib is loaded, all of its defined classes need to be registered in the global table.

One problem in C + + is the fragile base class (fragile base classes). OBJC does not have this problem because the offset of the instance variable is changed by the fix-up dynamic class at load time.

In OBJC, you can change the way a class is defined by defining a category. Sometimes you want to add the class of the method in another dylib, not your mirror (that is, to the system or other people's kind of knife), then also need to do some fix-up.

The selector in OBJC must be unique.

Initializers

C + + generates an initializer for statically created objects. And in OBJC there is a method called +load, but it was discarded, it is now recommended to use +initialize. See more: Http://stackoverflow.com/questions/13326435/nsobject-load-and-initialize-what-do-they-do

Now that you have the main executable file, a bunch of dylib, whose dependencies make up a huge graph, what is the order of the initializers? From the top up! Depending on the dependency, the leaf nodes are loaded first, and then the intermediate nodes are loaded upward until the root node is finally loaded. This loading order ensures security, and the rest of the dylib files that it relies on must have been preloaded before loading a dylib.

The final DYLD will call the main () function. Main () calls Uiapplicationmain ().

Improved start-up time

There is an animation between clicking the app icon and loading the app splash screen, and we want the app to start faster than the animation. Although the APP starts up differently on different devices, the boot time is best controlled at 400ms. It is important to note that once the boot time exceeds 20s, the system will assume that a dead loop has occurred and that the APP process has been killed. Of course, the startup time is best supported by the APP's minimum configuration device. Until the applicationwillfinishlaunching is transferred, the APP starts to end.

Measuring Start-up time

Warm Launch:app and data are already in memory
Cold Launch:app not in kernel buffer memory

Cold start (Launch) time is the important data we need to measure, to accurately measure the cold start time, the need to restart the device before measuring. It is difficult to measure before the main () method is executed, fortunately Dyld provides an built-in measurement method: Set the environment variable dyld_print_statistics to 1 in Xcode, Run, auguments. The contents of the console output are as follows:

12345678 Total pre-main time: 228.41 milliseconds (100.0%) dylib loading time: 82.35 milliseconds (36.0%) rebase/binding time: 6.12 milliseconds (2.6%) ObjC setup time: 7.82 milliseconds (3.4%) initializer time: 132.02 milliseconds (57.8%) slowest intializers : libSystem.B.dylib : 122.07 milliseconds (53.4%) CoreFoundation : 5.59 milliseconds (2.4%)

Optimize startup time

You can optimize for each step before the App starts.

Load Dylib

Before mentioned the loading system dylib quickly, because there is optimization. However, loading embedded (embedded) dylib files takes time, so try to merge multiple inline dylib into one to load, or use static archive. It is not recommended to use Dlopen () for lazy loading at run time, and doing so may cause some problems and the overall overhead is greater.

Rebase/binding

Previously mentioned that rebaing consumes a lot of time on I/O, and the subsequent Binding does not require I/O, but the time is spent on computation. So the time-consuming of these two steps is mixed together.

It was previously said that you could see a pointer in the __data segment that needed to be corrected (fix-up), so reducing the number of pointers would be less time consuming. For OBJC, the amount of metadata that is reduced by class,selector and category. Theories like coding principles and design patterns encourage people to write more sophisticated and short classes and methods, and separate each part of the method into a single category, which in fact increases the startup time. For C + +, virtual methods need to be reduced because virtual methods create vtable, which also creates structures in the __data segment. Although the C + + virtual method has less time to boot than OBJC metadata, it is still not negligible. Finally, it is recommended to use the SWIFT structure, which requires less fix-up content.

OBJC Setup

Few things can be done with this step, almost all by rebasing and Binding steps to reduce the required fix-up content. Because the work ahead will also make this step less time consuming.

Initializer

Explicit initialization

Use +initialize to replace +load
Do not use __atribute__ ((constructor)) to explicitly mark a method as an initializer, but rather to have it executed when the method invocation is initialized. For example, use Dispatch_once (), pthread_once (), or std::once (). That is, it is initialized during the first use and delays some of the work time.

Implicit initialization

For C + + static variables with complex (non-trivial) constructors:

The initializer is used where it is called.
Only the simple value type is assigned (pod:plain old data) so that the static linker will pre-calculate the data in the __data, eliminating the need for fix-up work.
Use the compiler warning flag-wglobal-constructors to discover implicit initialization code.
Use Swift to rewrite the code, because Swift has been pre-processed and strongly recommended.

Do not call Dlopen () in the initialization method, which has an impact on performance. Because Dyld runs before the App starts, because it is single-threaded, the system cancels the lock, but Dlopen () turns on multi-threading and the system has to lock, which seriously affects performance and can cause deadlocks and unknown consequences. Therefore, do not create threads in the initializer.

How to optimize your App's startup time

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

How to optimize your App's startup time

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

How to optimize your App's startup time

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support