Optimize your App's startup time

Source: Internet
Author: User

This is a WWDC Session 406 study note, from the principle to the practice of how to optimize the App startup time.

APP Run theory
    • main()What happened before the execution.

    • Mach-o format

    • Virtual Memory Basics

    • Mach-o binary Loading

Theoretical accelerated Mach-o Terminology

Mach-o is the file type for different runtime executables.

File type: the

    • Executable: The primary binary of the application

    • Dylib: Dynamic link library (also known as DSO or DLL)

    • Bundles: Dylib that cannot be linked can only be loaded at run time dlopen() and can be used as MacOS plugins.

Image:executable,dylib or Bundles

Framework: A folder containing Dylib and resource files and header files

Mach-o image File

Mach-o is divided into some segement, and each segement is divided into sections.

Segment names are uppercase, and the space size is the integer of the page. The size of the page is hardware-related, on the ARM64 schema one page is 16KB and the rest is 4KB.

section does not have an integer page size limit, but there is no overlap between sections.

Almost all mach-o contain these three segments (segment): __TEXT , __DATA and __LINKEDIT :

    • __TEXTContains the Mach header, the code that is executed, and the read-only constant (such as the C string). Read-only executable (r-x).

    • __DATAContains global variables, static variables, and so on. Readable and writable (rw-).

    • __LINKEDITContains the "metadata" of the loader, such as the name and address of the function. Read-only (r –).

Mach-o Universal File

FAT binaries, merging multiple schemas of mach-o files. It uses the FAT header to record the offset of the different schemas in the file, and the Fat header occupies one page of space.

Storing these segement and headers by paging can waste space, but this facilitates the implementation of virtual memory.

Virtual memory

Virtual memory is a layer of indirect addressing (indirection). There is a maxim in software engineering that any problem can be solved by adding an indirect layer. Virtual memory solves the problem of managing the use of physical RAM by all processes. By adding an indirection layer to allow each process to use the logical address space, it can be mapped to a physical page on RAM. This mapping is not one-to-one, the logical address may not be mapped to RAM, or there may be multiple logical addresses mapped to the same physical RAM. In the first case, the page fault is triggered when the process wants to store the logical address content, and the second case is multi-process shared memory.

The file can be read in the form of a paging map () without having to read the entire file at once mmap() . That is, a fragment of a file is mapped to a page of process logical memory. When a page that you want to read is not in memory, it will trigger page fault, and the kernel will only read the page and implement lazy loading of the file.

This means that the segments in the Mach-o file __TEXT can be mapped to multiple processes and can be lazy-loaded and share memory between processes. __DATAthe segment is readable and writable. Here the use of Copy-on-write technology, referred to as COW. When multiple processes share a single page of memory space, once a process is written, it copies the contents of the page memory and then re-maps the logical address to the new RAM page. That is, the process itself has a copy of that page of memory. This involves the concept of Clean/dirty page. The dirty page contains the process's own information, and the clean page can be regenerated by the kernel (reread the disk). So the cost of dirty page is greater than the clean page.

Mach-o Mirroring Loading

Therefore, when multiple processes load Mach-o mirrors __TEXT and __LINKEDIT because they are read-only, memory can be shared. and __DATA because it can read and write, it will produce dirty page. When the Dyld execution is finished, __LINKEDIT it is useless, and the corresponding memory pages are recycled.

Safety

ASLR (address space layout randomization): Location randomization, mirroring is loaded at random addresses. This is actually the old technology ten or twenty years ago.

Code Signing: Maybe we think Xcode will encrypt the entire file and make a digital signature. In fact, in order to verify the signature of the Mach-o file at runtime, it is not necessary to read the entire file every time, but to generate a separate cryptographic hash value for each page and store it in __LINKEDIT . This allows the contents of each page of the file to be verified and not tampered with in a timely manner.

From exec()To main()

exec()is a system call. The system kernel maps the application to the new address space, and each start location is random (because of the use of ASLR). And the process permissions from the starting position to 0x000000 this range are marked as non-read and write non-executable. If it is a 32-bit process, this range is at least 4KB, and at least 4GB for 64-bit processes. Both the NULL pointer reference and the pointer truncation error are captured by it.

dyldLoad the Dylib file

Unix was at ease for the first 20 years because it had not yet invented a dynamic link library. With the dynamic link library, a helper to load the link library is created. On Apple's platform dyld , there are other Unix systems as well ld.so . When the kernel finishes working on the mapping process dyld , it maps the name of the Mach-o file to a random address in the process, which sets the PC register to dyld the address and runs. The work that runs in the dyld app process is to load all the dynamic-link libraries that the app relies on, ready to run everything it needs, with the same permissions as the app.

The following steps make up dyld the timeline:

Initializers, OBJC, Bind, Rebase, Load dylibs

Load Dylib

Gets the list of dependent dynamic libraries that need to be loaded from the header of the main execution file, and the header has already been mapped by the kernel. Then it needs to find each dylib and then open the file to read the file starting location and make sure it is the Mach-o file. The code signature is then found and registered to the kernel. Then use each segment in the Dylib file mmap() . The Dylib file that the application relies on may be dependent on other dylib, so dyld what is needed is a recursive collection of dynamic library lists. General applications load 100 to 400 dylib files, but most are system dylib, which are pre-computed and cached and loaded quickly.

Fix-ups

After all the dynamic-link libraries are loaded, they are only in separate states and need to be bound together, which is fix-ups. Code signing makes it impossible for us to modify the instructions so that one dylib can call another dylib. You need to add a lot of indirect layers.

The modern Code-gen is called Dynamic PIC (Position Independent code), which means that it can be loaded onto an indirect address. When a call occurs, Code-gen actually __DATA creates a pointer to the callee in the segment, and then loads the pointer and jumps past.

So dyld the thing to do is to fix (fix-up) pointers and data. There are two types of fix-up, rebasing and binding.

Rebasing and Binding

rebasing: Adjusting the pointer's pointing inside the mirror

Binding: Pointing the pointer to content outside the mirror

Information such as rebase and bind can be viewed from the command line:

Xcrun Dyldinfo-rebase-bind-lazy_bind Myapp.app/myapp

With this command, you can view all the fix-up. Rebase,bind,weak_bind,lazy_bind are stored in __LINKEDIT segments and can be viewed by LC_DYLD_INFO_ONLY looking at the offsets and sizes of various information.

It is recommended to use Machoview to see more convenient and intuitive.

The dyld process of rebasing and Binding is briefly introduced from the source level.

ImageLoaderis a base class for loading executables, which is responsible for link mirroring, but does not care about the specific file format, because these are given to subclasses to implement. Each executable file will correspond to an ImageLoader instance. ImageLoaderMachOis a subclass that is used to load mach-o format files ImageLoader , ImageLoaderMachOClassic and ImageLoaderMachOCompressed both inherit from ImageLoaderMachO , respectively, to load those mach-o files that are in both __LINKEDIT traditional and compressed formats.

Because there is a dependency between the dylib, so ImageLoader many of the operations are recursive along the dependency chain, rebasing and Binding are no exception, respectively, recursiveBind() and recursiveBind() the two methods. Because it is recursive, it is called from the bottom up and the method is invoked doRebase() doBind() , so that the dependent dylib always executes rebasing and Binding before relying on its dylib. doRebase()the arguments passed in and doBind() contain a LinkContext context that stores a stack of states and related functions for the executable.

The rebasing and Binding will determine whether the prebinding has been made before. If pre-binding (prebinding) is already in place, then the fix-up process of rebasing and binding is not required because the pre-bound address is already loaded.

ImageLoaderMachOThere are four reasons why an instance does not use a pre-binding :

    1. The Mach-o Header MH_PREBOUND is marked0

    2. The image load address has an offset (as described later)

    3. Changes to dependent libraries

    4. Mirroring uses flat-namespace, a portion of the pre-binding is ignored

    5. LinkContextThe environment variable prohibits pre-binding

ImageLoaderMachOdoRebase()The following are the things that are done:

    1. If using pre-binding,   fgimageswithusedprebinding   Count plus one, and   return  ; otherwise go to step two

    2. If   mh_prebound   flag bit   1  , which can be pre-bound but not used, and mirrored in shared memory, resets all lazy pointer in the context. (If the image is in shared memory, it will be bound later in the binding process, so there is no need to reset)

    3. If the mirror load address offset is 0, no rebasing, direct   return  ; Otherwise enter fourth step

    4. Call   rebase ()   method, which is the way to really do rebasing work. If &NBSP is turned on, text_reloc_support   macro will allow   rebase ()   method to   __text The   segment is written to fix-up it. So in fact   __text   Read-only properties are not absolute.

ImageLoaderMachOClassicand ImageLoaderMachOCompressed to implement their own doRebase() methods separately. The same logic is used to determine whether to use pre-binding and to determine TEXT_RELOC_SUPPORT whether to write to the segment when the real binding is working __TEXT . Finally, the setupLazyPointerHandler entry point set in the mirror is called, and the dyld last call is made to set the main executable __dyld or __program_vars .

Rebasing

In the past, Dylib was loaded into the specified address, and all pointers and data were right for the code and dyld there was no need to do any fix-up. Now with ASLR regret to load dylib to a new random address (actual_address), this random address with the code and data point to the old address (preferred_address) will be biased, dyld need to fix this deviation (slide), The procedure is to add this offset to the pointer address inside the DYLIB, and the offset is calculated as follows:

Slide = actual_address-preferred_address

Then there is the repetition of the __DATA need to rebase the pointer in the segment to add this offset. This involves page fault and COW. This can cause I/O bottlenecks, but because the order of rebase is arranged by address, this is a sequential task from the kernel point of view, which reads the data in advance and reduces I/O consumption.

Binding

The binding is to handle pointers to external dylib, which are actually bound by the symbol name, which is a string. The preceding __LINKEDIT paragraph also stores pointers that require bind, as well as the symbols that the pointer needs to point to. dyldneed to find the symbol corresponding to the implementation, which requires a lot of calculations, go to the symbol table lookup. When found, the contents are stored __DATA in the pointer in the segment. The binding appears to be computationally larger than rebasing, but requires very little I/O operations because the rebasing has already been done for the binding.

OBJC Runtime

Many of the data structures in objective-c are fixed by rebasing and Binding (fix-up), such as Class pointers to super-classes and pointers to methods.

OBJC is a dynamic language that can instantiate an object of a class with the name of the class. This means that the OBJC Runtime needs to maintain a global table of mapped class names and classes. When a dylib is loaded, all of its defined classes need to be registered in the global table.

One problem in C + + is the fragile base class (fragile base classes). OBJC does not have this problem because the offset of the instance variable is changed by the fix-up dynamic class at load time.

In OBJC, you can change the way a class is defined by defining a category. Sometimes you want to add the class of the method in another dylib, not your mirror (that is, to the system or other people's kind of knife), then also need to do some fix-up.

The selector in OBJC must be unique.

Initializers

C + + generates an initializer for statically created objects. There is a method called in ObjC +load , but it is deprecated and is now recommended +initialize . See more: Http://stackoverflow.com/questions/13326435/nsobject-load-and-initialize-what-do-they-do

Now that you have the main executable file, a bunch of dylib, whose dependencies make up a huge graph, what is the order of the initializers? From the top up! Depending on the dependency, the leaf nodes are loaded first, and then the intermediate nodes are loaded upward until the root node is finally loaded. This loading order ensures security, and the rest of the dylib files that it relies on must have been preloaded before loading a dylib.

Finally, the dyld function is called main() . main()is called UIApplicationMain() .

Improved start-up time

There is an animation between clicking the app icon and loading the app splash screen, and we want the app to start faster than the animation. Although the APP starts up differently on different devices, the boot time is best controlled at 400ms. It is important to note that once the boot time exceeds 20s, the system will assume that a dead loop has occurred and that the APP process has been killed. Of course, the startup time is best supported by the APP's minimum configuration device. Until applicationWillFinishLaunching it is transferred, the APP starts to end.

Measuring Start-up time

Warm Launch:app and data are already in memory

Cold Launch:app not in kernel buffer memory

Cold start (Launch) time is the important data we need to measure, to accurately measure the cold start time, the need to restart the device before measuring. main()It is difficult to measure before the method is executed, but it is good to provide the built dyld -in measurement method: Set the environment variable to. auguments in Xcode, Run, Edit scheme DYLD_PRINT_STATISTICS 1 . The contents of the console output are as follows:

time:228.41 milliseconds (time:82.35 milliseconds (36%)
time:6.12 milliseconds (2.6%)
time:7.82 milliseconds (time:132.02 milliseconds (intializers:libsystem.b. dylib:122.07 milliseconds (53.4%)
corefoundation:5.59 milliseconds (2.4%)
Optimize startup time

You can optimize for each step before the App starts.

Load Dylib

Before mentioned the loading system dylib quickly, because there is optimization. However, loading embedded (embedded) dylib files takes time, so try to merge multiple inline dylib into one to load, or use static archive. It dlopen() is not recommended to use lazy loading at run time, and doing so may cause some problems and the overall overhead is greater.

Rebase/binding

Previously mentioned that rebaing consumes a lot of time on I/O, and the subsequent Binding does not require I/O, but the time is spent on computation. So the time-consuming of these two steps is mixed together.

As I said before, you can reduce the __DATA amount of time it takes to do this by reducing the number of pointers that you need to fix (fix-up) from the view segment. For OBJC, it is the reduction Class , selector and the category number of these metadata. Theories like coding principles and design patterns encourage people to write more sophisticated and short classes and methods, and separate each part of the method into a single category, which in fact increases the startup time. For C + +, the virtual method needs to be reduced because the virtual method creates the vtable, which also creates the structure in the __DATA segment. Although the C + + virtual method has less time to boot than OBJC metadata, it is still not negligible. Finally, it is recommended to use the SWIFT structure, which requires less fix-up content.

OBJC Setup

Few things can be done with this step, almost all by rebasing and Binding steps to reduce the required fix-up content. Because the work ahead will also make this step less time consuming.

Initializer Explicit initialization
    • Use +initialize to replace+load

    • Do not use __atribute__((constructor)) to explicitly mark a method as an initializer, but rather let the initialization method call. such as use dispatch_once() , pthread_once() or std::once() . That is, it is initialized during the first use and delays some of the work time.

Implicit initialization

For C + + static variables with complex (non-trivial) constructors:

    1. The initializer is used where it is called.

    2. Only the simple value type is assigned (pod:plain old data), so that the static linker will pre __DATA -calculate the data in advance, eliminating the need for fix-up work.

    3. Use the compiler warning flag -Wglobal-constructors to discover implicit initialization code.

    4. Use Swift to rewrite the code, because Swift has been pre-processed and strongly recommended.

Do not invoke in the initialization method dlopen() and have an impact on performance. Because it dyld runs before the App starts, because it is single-threaded, the system unlocks, but dlopen() the multi-threading is turned on and the system has to be locked, which seriously affects performance, and can cause deadlocks and unexpected consequences. Therefore, do not create threads in the initializer.

Optimize your App's startup time

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.