Research on Quick Start Technology in Embedded Linux

Source: Internet
Author: User

The main feature of the embedded Linux system is to use bootloader to replace the BIOS of the desktop system and scale the system. However, the hardware disadvantage often leads to a slow system startup speed, embedded product users are also sensitive to the system boot speed, which leads to the demand for improving the startup speed of Embedded Linux systems. This article discusses the operations performed at system startup and the methods to shorten these operations.

1 embedded Linux Startup Time Series

Currently, the hardware platform and application of the embedded system are very different, but the overall startup process is the same. Here, system startup refers to the process from the user's power-on/Reset operation to the system's ability to provide services that users can receive. The typical power-on/Reset sequence is shown in table 1.

Table 1 startup sequence of an embedded Linux System

2 Linux Quick Start Method

Currently, some Linux releases have optimized the startup speed. If standard Linux is used for development, the startup speed is improved mainly through Kernel configuration and various patch packages. The following describes some key technologies for quick start.

 2.1 firmware and bootloader phases

Once the target board is determined, the running time of firmware cannot be changed, and the read/write speed of flash and Ram is also determined. However, if firmware and bootloader can be bypassed during the reset, that is, enabling the running kernel to load and run another kernel can shorten the startup time. Typical implementations include kexec, which has two components: user space component kexectools and kernel patch. Another method is to add reboot = soft to the kernel command line and skip firmware, but the disadvantage is that it cannot be called from the user space.

For normal boot, you can select a fast Bootloader and compact the kernel. You can also use high-speed image replication technology (such as dma2ram) to shorten the replication time. To shorten the decompression time, you can find a more efficient compression algorithm. But in general, the higher the compression ratio, the more complicated the algorithm, the slower the decompression speed, resulting in the copy time (inversely proportional to the compression ratio) and the decompression time (generally proportional to the compression ratio).

 

2.2 kernel stage

During kernel initialization, realtime clock (RTC) must be synchronized. This process takes 1 s of time and can be removed to save time, but the CPU may be 1 s different from the correct time. If the CPU clock is to be stored in RTC at shutdown, deviations will continue to accumulate. However, systems that use external clock sources for synchronization can safely skip this phase.

Preset lpj can be used to shorten the time consumed by calling calibrate_delay () at each start to calibrate loops_per_jiffy. This time cost has nothing to do with the CPU frequency. In a typical embedded hardware environment, it will consume about Ms. The lpj value should be consistent for the fixed hardware platform and can be calculated only once. In the subsequent startup, You can forcibly specify the lpj value in the startup parameters, instead of skipping the actual calculation process. The specific method is to record the "calibrating delay" value in the kernel startup information after normal startup, and forcibly specify the value in the startup parameter in the form of "lpj = xxxxxx.

By default, the console is opened to output startup messages, but the console, especially the frame buffer-based console, slows down startup. Therefore, in Embedded Linux, set the console to silent during startup by adding "quiet" to the kernel startup parameters ".

Device search and driver installation are time-consuming operations. Therefore, you need to determine which driver modules need to be installed during kernel compilation to prevent the system from searching for unnecessary devices, especially redundant ide devices. If you do not need to install any device during startup, compile the driver into a module and load it when you are idle or using the device, instead of all in the startup phase.

2.3 user space stage

The initialization script of traditional Linux is executed by bash, And the INIT process (/sbin/init) is started after kernel boot ). It uses an ASCII file (/etc/inittab) to change the running level. In this file, rcsript will be called again, and the file will be searched for/etc/rc. d/rc5.d/and start the system service to which the link points.

To consume electronic Linux systems, you must enable graphical interfaces and other necessary services. unoptimized systems enable many system services that are not used or are currently unavailable by default, this part will spend a lot of time. The simplest optimization method is to customize the system service by rewriting the service configuration file based on actual needs. In addition, the init script is executed in a serial manner. When the script size is large, the boot process is very high. Therefore, you can consider running various services in parallel to speed up startup. Now there are some initialization programs to replace the INIT process. The following describes initng and upstart.

Initng (init nextgerneration) can start the service in parallel to complete initialization quickly. Initng considers that services that meet the dependency can be started. When loading a script from external storage or waiting for the hardware device to start, you can run another script to start other services, so that the system can achieve a better balance between CPU and I/O. As a dependency-based solution, initng uses its own initialization script set to encode the dependencies between services and daemon. If a service depends on other services (defined by the need keyword), make sure that all services on which it depends can be used at startup. Services without dependency are started concurrently. Services with dependency need to be started safely.

The difference between upstart and initng is that upstart is based on an event. The start/stop of a task/service depends on whether the event it is waiting for has occurred. Upstart has flexible definitions of events, including edge (simple) events, level (value) events, and temporal events. Use start/stop, event name, and expected value (optional) to describe the trigger event. There are two methods for event dependency: one is that the task itself causes an event, and an event will occur no matter when the task starts or ends. for basic tasks to be executed at startup, this method is more effective. For complex dependencies, you can use the shell script tool of the task.

 

2.4 pre-read and pre-Link

Pre-read (readahead) can pre-load files (Program and library files) to the RAM cache before use, so that I/O is not accessed during use. If you know which files you want to access in the next step, you can

/Partially read to the buffer to speed up execution. In many cases, the next step of an embedded system is predictable. For example, when the system is started, it always accesses the same executable/data files in the same order, and the access to file blocks is usually sequential, when an application starts, it always accesses the same program file segment, shared library, resource, or input file. In this way, pre-reading is highly targeted to improve program execution speed.

Elf (excutable and linkable file) is a standard binary format in Linux. To start elf, perform the following steps: map the shared library to a virtual address space, parse symbol references, and initialize each ELF File. Because the shared library is location-independent, it is necessary to complete Part of the relocation processing and symbol search at runtime before it can jump to the entry point of the program. Therefore, while bringing flexibility, it also slows down the startup of ELF files, especially because parsing symbolic references takes a lot of time, especially for large programs that use multiple shared libraries. However, in many embedded systems, executable files and shared libraries are rarely changed, and the link work is identical each time the program runs.

Prelink uses this to modify the elf shared library and binary file and add the link information to the executable file to simplify dynamic link relocation, accelerating program startup. The pre-Link first collects the ELF binary files to be pre-linked and the shared libraries it depends on, and allocates a unique virtual space location for each database, and re-link the shared library to this reference location (when the dynamic linker wants to load the library, as long as the virtual space address is not occupied, it will map the Library to the specified location ); the pre-link parses all the relocations in the binary or library, stores the relocation information to the elf object, and adds the list and checksum of all dependent libraries to the binary file or library. For binary files, you also need to list all conflicts (the symbols are parsed differently within the natural search range of the shared library ). During runtime, the dynamic linker first checks whether all dependent libraries have been mapped to the specified location, and the library files have not changed. It only considers conflicts and does not need to process the relocation of each database, this greatly improves the program startup speed. Note that if the shared library changes, all programs that use it need to be relinked. Otherwise, the program must perform time-consuming normal relocation.

3. xip and file system optimization

3.1 code execution Method

In an embedded system, code is executed in three ways:

① Full shading (fully shadowed ). When the embedded system is running, all the code is copied from non-Easy memory (flash, Rom, etc.) to ram for running.

② On-Demand paging (demand paging ). Only part of the code is copied to ram. This method is used to import/export pages in Ram. Page errors are generated only when the access is in the virtual storage but not in the physical ram, and the code and data are mapped to ram.

③ Execute in place (xip ). When the system is started, code is directly executed in a non-volatile storage location instead of being copied to ram. Ram only stores the constantly changing data, as shown in figure 1. If the read Speed of non-volatile memory is similar to that of RAM, xip can save the copy and Decompression Time. Nor flash and ROM Read speed is relatively fast (about 100 ns), suitable for xip; while NAND Flash read operations are based on sectors, the speed is relatively slow (μs level ), therefore, xip cannot be implemented.

Figure 1 Comparison between full ing and xip

 

 

Xip can be divided into the following two types:

① Kernel xip. Run the kernel directly in Flash/ROM to save time for copying and extracting images. Linux 2.6.10 kernel already contains xip support.

② Application xip. Execute Code directly from the storage location of the application code, instead of loading it into RAM, so that the first execution of the application will be faster. To use the xip of an application, it should be based on the file system that supports it.

3.2 xip File System

Currently, there are two main implementation methods for xip file systems: Linear xip cramfs and advanced xip File System (axfs ).

Cramfs is a compressed read-only file system. It was originally used to start the Desktop Linux system. However, cramfs can support the embedded system and xip after modification. Linear xip cramfs uses a sticky bit to differentiate the files it manages, marked as compressed (On-Demand paging) or uncompressed (xip ). If the file is marked as xip, all pages are not compressed and must be stored continuously in flash. When loading xip files, all page addresses are mapped directly. When a page error occurs, the files on demand are extracted to ram.

To create a linear xip cramfs file system image, you must determine the frequency of use of executable files and library files. Frequently Used files are suitable for xip, while other files should be compressed. Some tools (such as ramust and cfsst) can help determine which files require xip and which do not. The following code marks the xip file and creates the root file system. The following uses the mkfs. cramfs tool as an example:

Chmod + T filenames

Mkfs. cramfs-x rootfs. Bin

In addition, you must modify the Kernel configuration parameters to support xip: add the default kernel command string to the startup options
Rootfstype = cramfs, select the kernel xip and set the xip kernel physical address; add MTD to the driver to support xip; and add linear xip cramfs to the file system. Then you can generate an xip image.

One drawback of linear xip cramfs is that it is file-based, that is, all pages in a file either use xip or all use compression/on-demand paging, in fact, the usage frequency of different pages in the same file varies greatly. Axfs is a new read-only file system developed by Intel. It inherits many methods from linear xip cramfs and makes some improvements. The xip granularity of axfs is based on pages and comes with tools to determine which pages require xip and which pages need to be compressed, so as to better balance the speed with the use of RAM/flash.

 

 

3.3 non-xip File System

Xip is generally based on nor flash, and the cost is relatively high. For applications with a large amount of data, users often need to use NAND Flash-based files. Non-xip file systems commonly use jffs2/yaffs.

Jffs2 is a compressed file system. In multimedia applications, if images, audio and video have been compressed, using jffs2 will undoubtedly bring a dual compression/Decompression burden to the CPU, and the access speed will also be affected. Therefore, in such intensive applications, the use of non-compressed file systems (such as yaffs/yaffs2) can speed up the system.

Yaffs/yaffs2 is a log file system designed for Embedded Systems Using NAND Flash. Compared with jffs2, jffs2 reduces some features (for example, data compression is not supported), so the speed is faster, the mounting time is short, and the memory usage is small. Yaffs/yaffs2 comes with a NAND chip driver. You can directly operate the file system without using MTD or VFS. The main difference between yaffs and yaffs2 is that the former only supports small pages (512 bytes) and nand flash; the latter supports large pages (2 kb) and NAND Flash, it also improves the memory usage, garbage collection, and access speed.

Conclusion

Quick Start is one of the urgent requirements for Embedded Linux systems. This article analyzes the guiding process and key latency factors of the embedded system, puts forward corresponding solutions, and introduces the xip file system. Because the startup speed is very dependent on the hardware platform, and some methods are mutually exclusive, you need to consider and select a specific application.

References

[1] Tim bird R. Methods to Improve bootup time in Linux [R]. Proceedings of the Linux Symposium, Ottawa, 2004.

[2] Karim yaghmour. Build an embedded Linux system [M]. Beijing: China Power Press, 2004: 49-66.

[3] Chen Lijun. In-depth analysis of the Linux kernel source code [M]. Beijing: China Post and Telecommunications Press, 2001: 477-499.

[4] Zuo Daquan, Wu Gang. Embedded Linux Quick Start and xip application [J]. Computer engineering and science, 2006 (12 ).

 

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.