This article introduces in detail the concept of boot loader, which is based on the OS boot loader in the embedded system,
The main tasks of software design and the structure framework.
1. Introduction
Running GNU/Linux systems on dedicated embedded boards has become increasingly popular. An embedded Linux system can be divided into four levels from the software perspective:
1. boot the loader. Includes the Boot Code (optional) in the firmware and the boot loader.
2. Linux kernel. Customized kernel and kernel startup parameters specific to embedded boards.
3. file system. Including the root file system and the file system built on the flash memory device. RAM disk is usually used as root FS.
4. user applications. User-specific applications. Sometimes an embedded graphical user interface may be included between the user application and the kernel layer. Common embedded guis are Microwindows and MiniGUI.
The pilot loader is the first piece of software code that runs after the system is powered on. Recall the architecture of the PC. We can know that the boot loader in the PC is defined by the BIOS (which is essentially a firmware program) and the OS boot loader in the hard disk MBR (for example, lilo and grub. After the BIOS completes hardware detection and resource allocation, it reads the boot loader in the hard disk MBR into the system ram, and then gives the control to the OS boot loader. The main task of running boot loader is to read the kernel image from the hard disk to ram, and then jump to the kernel entry point to run, that is, start the operating system. In embedded systems, there are usually no firmware programs like bios (NOTE: Some embedded CPUs also embed short boot programs ), therefore, the boot loader is used to load and start the entire system. For example, in an embedded system based on ARM7TDMI core, the system usually starts from 0x00000000 when powered on or reset, in this address, the boot loader program of the system is usually arranged.
This article describes the concept of boot loader, the main tasks of boot loader, and the Framework Structure of boot loader.
And the installation of boot loader.
2. Concepts of Boot Loader
Simply put, boot loader is a small program that runs before the operating system kernel runs. Through this small program, we can initialize hardware devices and build a map of memory space to bring the system's hardware and software environment to a suitable state, in order to prepare the correct environment for the final call to the operating system kernel. Generally, Boot Loader relies heavily on hardware, especially in the embedded world. Therefore, it is almost impossible to build a general boot loader in the embedded world. However, we can still summarize some general concepts of boot loader to guide the user's specific design and implementation of boot loader.
1. CPU and insert Board supported by Boot Loader
Different CPU architectures have different boot loaders. Some boot loaders also support CPUs with Multiple Architectures. For example, U-boot supports both the ARM architecture and mips architecture. In addition to the CPU-dependent architecture, Boot Loader actually depends on the configuration of embedded board-level devices. That is to say, for two different insert boards, even if they are built based on the same CPU, to enable the boot loader program running on a board to run on another board, you usually need to modify the source program of the boot loader.
2. Installation Medium)
After the system is powered on or reset, all CPUs usually take instructions from a pre-arranged address by the CPU manufacturer. For example, the CPU Based on ARM7TDMI core usually obtains its first command from address 0x00000000 during reset. CPU-based embedded systems usually have some kind of solid-state storage devices (such as Rom, EEPROM, or flash) mapped to this preset address. Therefore, after the system is powered on, the CPU will first execute the boot loader program. 1 is a typical space allocation structure of a solid-state storage device with both boot loader, kernel startup parameters, kernel image, and root file system image. Figure 1 typical space allocation structure of a solid state storage device
3. the device or mechanism used to control the Boot Loader
Generally, a connection is established between the host and the target host through a serial port. During execution, the boot loader software usually uses a serial port to perform I/O, for example, outputting information to the serial port, reads user control characters from the serial port.
4. the boot loader is a single stage or multi-stage Boot Loader. Generally, multi-stage Boot Loader provides more complex functions and better portability. Most boot loaders started from solid state storage devices are two-phase boot processes, namely, the boot process can be divided into two parts: stage 1 and stage 2. The specific tasks completed in stage 1 and stage 2 will be discussed below.
5. Operation Mode)
Most boot loaders have two different operating modes: "Start loading" mode and "Download" mode. This difference is only meaningful to developers. However, from the end user's perspective, the function of boot loader is to load the operating system, but there is no difference between the so-called start loading mode and the download working mode. Boot Loading Mode: This mode is also called the autonomous mode. That is, Boot Loader loads the operating system to ram from a solid-state storage device on the target machine, and does not involve user intervention throughout the process. This mode is the normal working mode of Boot Loader. Therefore, when embedded products are released, Boot Loader obviously must work in this mode. Downloading mode: In this mode, the boot loader on the target machine downloads files from the host through serial port connection or network connection. For example: download kernel images and root file system images. Files downloaded from the host are usually first saved to the ram of the target machine by boot loader, and then written to the Flash solid-state storage device on the target machine. This mode of boot loader is usually used when the kernel and root file system are installed for the first time. In addition, boot loader will be used for later system updates. In this mode, Boot Loader usually provides a simple command line interface to its end users.
A powerful Boot Loader such as blob or U-boot usually supports both working modes and allows users to switch between them. For example, blob is in normal start loading mode at startup, but it will be delayed for 10 seconds until the end user presses any key and switches blob to download mode. If you do not press the button within 10 seconds, blob continues to start the Linux kernel.
6. Communication devices and protocols used for file transmission between the Bootloader and the host
The most common situation is that the boot loader on the target machine transmits files to the host through the serial port. The transmission protocol is usually one of XMODEM/ymodem/zmodem protocols. However, the speed of serial transmission is limited. Therefore, it is better to use Ethernet connection and TFTP protocol to download files. In addition, the software used by the host should also be considered when talking about this topic. For example, when downloading files through Ethernet connections and the TFTP protocol, the host must have a software to provide the TFTP service. After discussing the above concepts of bootloader, let's take a look at what tasks should be completed by bootloader.
3. Main Tasks and typical structure framework of Boot Loader
Before continuing the discussion in this section, let's make a assumption that the kernel image and the root file system image are loaded into RAM for running. The premise is that kernel images and root file system images in embedded systems can also run directly on solid-state storage devices such as ROM and flash. However, this practice is undoubtedly at the expense of running speed. From the operating system perspective, the general goal of boot loader is to correctly call the kernel for execution. In addition, because the implementation of Boot Loader depends on the CPU architecture, most boot loaders are divided into two parts: stage1 and stage2. Code dependent on the CPU architecture, such as the device initialization code, is usually stored in stage1, and is usually implemented in assembly language to achieve the goal of being short and concise. While stage2 is implemented through common C language, which can provide complex functions and better code readability and portability.
Stage1 of Boot Loader generally includes the following steps (in the order of execution ):
Hardware Device initialization.
Prepare the ram space for loading the stage2 of the boot loader.
Copy the stage2 of the boot loader to the ram space.
Set the stack.
Jump to the C entry point of stage2.
The stage2 of Boot Loader usually includes the following steps (in the order of execution ):
Initialize the hardware devices to be used in this phase.
Checks the memory ing of the system ).
Read the kernel image and the root file system image from the flash to the ram space.
Set startup parameters for the kernel.
Call the kernel.
3.1 stage1 of Boot Loader
3.1.1 basic hardware initialization
This is an operation performed by the boot loader at the beginning. It aims to prepare some basic hardware environments for the execution of stage2 and subsequent kernel operations. It usually includes the following steps (in the order of execution ):
1. Block all interruptions. It is usually the responsibility of the OS device driver to provide services for interruptions. Therefore, you do not have to respond to any interruptions during the whole process of bootloader execution. Interrupt shielding can be completed by writing the interrupt shielding register or Status Register (such as the CPSR register of arm) of the CPU.
2. Set the CPU speed and clock frequency.
3. Ram initialization. Including Correctly Setting the system's memory controller function registers and various memory library control registers.
4. initialize the LED. Typically, gpio is used to drive the LED. The purpose is to indicate whether the system status is OK or error. If no led exists on the board, you can initialize UART to print the logo character information of boot loader to the serial port.
5. Disable CPU Internal commands/data cache.
3.1.2 prepare Ram space for loading stage2
To get a faster execution speed, stage2 is usually loaded into the ram space for execution. Therefore, an available Ram space range must be prepared for the stage2 loading boot loader. Because stage2 is usually executed in C language, in addition to the size of the executable image of stage2, the size of the stack space must also be taken into account. In addition, the space is preferably a multiple of the memory page size (usually 4 kb. Generally, 1 m Ram space is enough. The specific address range can be arbitrarily arranged. For example, Blob will arrange its stage2 executable image to 1 m space starting from the system Ram starting address 0xc0200000. However, arranging stage2 to the top 1 MB (I .e. (RamEnd-1MB)-ramend) of the entire Ram space is a recommended method. For the convenience of subsequent descriptions, the size of the scheduled Ram space range is recorded as: stage2_size (in bytes), and the start address and end address are respectively recorded: stage2_start and stage2_end (the two addresses are aligned with the 4-byte boundary ). Therefore:
Stage2_end = stage2_start + stage2_size
In addition, make sure that the configured address range is indeed a read/write Ram space. Therefore, you must test the configured address range. The specific test method can be similar to the Blob method, that is, to test whether the two words starting with each memory page are readable and writable. For the convenience described later, we will note that the detection algorithm is test_mempage. The specific steps are as follows:
1. Save the content of memory page in the first two words.
2. Write any number to the two words. For example, write 0x55 to the first word, and write 0 x AA to 2nd words.
3. Then, read the content of the two words back immediately. Obviously, we should read 0x55 and 0xaa respectively. If not, it indicates that the address range occupied by this memory page is not a valid Ram space.
4. Write any number into the two words. For example, write 0xaa to the first word, and 0x55 to 2nd words.
5. Then, immediately read the content of the two words back. Obviously, the content we read should be 0xaa and 0x55. If not, it indicates that the address range occupied by this memory page is not a valid Ram space.
6. restore the original content of the two words. The test is complete.
To get a clean Ram space range, we can also clear the configured Ram space range.
3.1.3 copy stage2 to ram
Two points must be determined during copy: (1) the starting and ending addresses of the executable image of stage2 on the solid state storage device; (2) the starting address of the ram space.
3.1.4 set the stack pointer sp
The stack pointer is set to prepare for executing the C-language code. Normally we can set the sp value to (stage2_end-4), that is, the top of the 1 mb ram space arranged in section 3.1.2 (stack grows down ). In addition, before setting the stack pointer sp, you can also turn off the LED light to remind the user that we are going to jump to the st
Age2.
After these steps, the physical memory layout of the system should be shown in 2.
3.1.5 jump to the C entry point of stage2
After everything above is ready, you can jump to the stage2 of the boot loader to execute it. For example, in the arm system, this can be achieved by modifying the PC register to an appropriate address.
Figure 2 system memory layout when the stage2 executable image of bootloader is just copied to the ram Space
3.2 stage2 of Boot Loader
As mentioned above, the Code of stage2 is usually implemented in C language to facilitate more complex functions and better code readability and portability. However, unlike general C-language applications, when compiling and linking programs such as boot loader, we cannot use any supported functions in the glibc library. The reason is obvious. This brings us a problem, that is, jump from there into the main () function? Using the starting address of the main () function as the entry point for the entire stage2 image execution may be the most direct idea. However, this method has two disadvantages: 1) the function parameter cannot be passed through the main () function; 2) the response of the main () function cannot be processed. A more clever method is to use the concept of trampoline (spring bed. That is, write a trampoline Applet in assembly language, and use this trampoline Applet as the execution entry point of the stage2 executable image. Then we can use the CPU jump command in the trampoline Assembly applet to jump into the main () function for execution. When the main () function returns, the CPU execution path apparently returns to our trampoline program again. In short, the idea of this method is to use this trampoline Applet as an external Packer of the main () function (external wrapper ).
The following is a simple example of the trampoline Program (from BLOB ):
. Text
. Globl _ trampoline
_ Trampoline:
BL main
/* If main ever returns we just call it again */
B _ trampoline
As you can see, when the main () function returns, we use a jump command to re-execute the trampoline program.
-Of course, the main () function will be re-executed, which is the meaning of the word trampoline (spring bed.
3.2.1 initialize the hardware devices to be used in this phase
This usually includes: (1) initializing at least one serial port for I/O output information with end users; (2) initializing a timer.
Before initializing these devices, you can also turn on the LED lights again to indicate that we have entered the main () function execution.
After device Initialization is complete, you can output some print information, program name string, version number, and so on.
3.2.2 memory ing (memory map) of the detection system)
Memory ing refers to the address ranges allocated to address system Ram units in the entire 4 GB physical address space. For example, in the SA-1100 CPU, the 0000 M address space starting from 0xc000, 512 is used as the system's RAM address space, and in the Samsung CPU, from 0x0c00, the 64 M address space between 0000 and 0 x and is used as the system's RAM address space. Although the CPU usually reserves a large segment of sufficient address space for the system Ram, it does not necessarily implement all the RAM address space reserved by the CPU when building a specific embedded system. That is to say, a specific embedded system only maps a part of all the RAM address space reserved by the CPU to a ram unit, and keeps the remaining reserved RAM address space in the idle state. Due to the above fact, stage2 of Boot Loader must do something in it (for example, read the kernel image stored in flash to the ram space) before detecting the memory ing of the entire system, it must know which RAM address units are mapped to and which are in the "UNUSED" State in all the RAM address spaces reserved by the CPU.
(1) Description of memory ing
The following data structure can be used to describe the continuous address range in the RAM address space:
Typedef struct memory_area_struct {
U32 start;/* The base address of the memory region */
U32 size;/* The Byte Number of the memory region */
Int used;
} Memory_area_t;
The continuous address range in the RAM address space can be in one of the two states: (1) used = 1, indicating that the continuous address range has been implemented, that is, it is truly mapped to a ram unit. (2) used = 0 indicates that the continuous address range is not implemented by the system, but is not in use.
Based on the above memory_area_t data structure, the reserved RAM address space of the entire CPU can be expressed by an array of the memory_area_t type, as shown below:
Memory_area_t memory_map [num_mem_areas] = {
[0... (num_mem_areas-1)] = {
. Start = 0,
. Size = 0,
. Used = 0
},
};
(2) memory ing Detection
The following is a simple and effective algorithm used to check the memory ing of the entire RAM address space:
/* Array initialization */
For (I = 0; I <num_mem_areas; I ++)
Memory_map. Used = 0;
/* First write a 0 to all memory locations */
For (ADDR = mem_start; ADDR <mem_end; ADDR + = page_size)
* (U32 *) ADDR = 0;
For (I = 0, ADDR = mem_start; ADDR <mem_end; ADDR + = page_size ){
/*
* Check starts from the base address mem_start + I * page_size. The size is
* Whether the page_size address space is a valid RAM address space.
*/
Call the algorithm test_mempage () in section 3.1.2 ();
If (current Memory Page isnot a valid ram page ){
/* No Ram here */
If (memory_map. Used)
I ++;
Continue;
}
/*
* The current page is already a valid address range mapped to ram.
* But check whether the current page is only the alias of an address page in the 4 GB address space?
*/
If (* (u32 *) ADDR! = 0) {/* alias? */
/* This memory page is the alias of an address page in the 4 GB address space */
If (memory_map. Used)
I ++;
Continue;
}
/*
* The current page is already a valid address range mapped to ram.
* It is not the alias of an address page in a 4 GB address space.
*/
If (memory_map. Used = 0 ){
Memory_map. Start = ADDR;
Memory_map. Size = page_size;
Memory_map. Used = 1;
} Else {
Memory_map. Size + = page_size;
}
}/* End of (...) */
After the above algorithm is used to check the system memory ing, boot loader can also print the memory ing details to the serial port.
3.2.3 load the kernel image and root file system image
(1) Plan the layout of memory usage
There are two aspects: (1) memory occupied by the kernel image; (2) memory occupied by the root file system. When planning the layout of memory usage, we mainly consider the size of the base address and image.
For kernel images, it is generally copied from (mem_start + 0x8000) the base address starts with a memory size of about 1 MB (usually 1 MB for Embedded linux kernels ). Why should we leave out the 32 KB memory from mem_start to mem_start + 0x8000? This is because the Linux kernel needs to place some global data structures in the memory, such as startup parameters and kernel page tables.
For the root file system image, it is generally copied to the place where mem_start + 0x0010,0000 starts. If ramdisk is used as the root file system image, the size after decompression is generally 1 MB.
(2) copy from flash
Because embedded CPUs like arm are usually used to address solid-state storage devices such as flash in a unified memory address space, therefore, reading data from flash is no different from reading data from a ram unit. Using a simple loop, you can copy images from flash devices:
While (count ){
* DEST ++ = * SRC ++;/* they are all aligned with word boundary */
Count-= 4;/* byte Number */
};
3.2.4 set kernel startup parameters
It should be said that after copying the kernel image and the root file system image to the ram space, you can prepare to start the Linux kernel. However, before calling the kernel, you should make one step of preparation, that is, set the Linux kernel startup parameters.
Linux 2.4.x and later kernels all expect to pass startup parameters in the form of tagged list. Start the parameter tag list to mark atag_core and end with atag_none. Each tag consists of the tag_header structure that identifies the passed parameter and the subsequent parameter value data structure. The data structure tag and tag_header are defined in the include/ASM/setup. h header file of the Linux kernel source code:
/* The list ends with an atag_none node .*/
# Define atag_none 0x00000000
Struct tag_header {
U32 size;/* Note: The size is in the unit of words */
U32 tag;
};
......
Struct tag {
Struct tag_header HDR;
Union {
Struct tag_core core;
Struct tag_mem32 MEM;
Struct tag_videotext videotext;
Struct tag_ramdisk ramdisk;
Struct tag_initrd initrd;
Struct tag_serialnr serialnr;
Struct tag_revision revision;
Struct tag_videolfb videolfb;
Struct tag_cmdline using line;
/*
* Acorn specific
*/
Struct tag_acorn acorn;
/*
* Dc21285 specific
*/
Struct tag_memclk memclk;
} U;
};
In Embedded Linux systems, common boot parameters that need to be set by Boot Loader include atag_core, atag_mem, atag_cmdline, atag_ramdisk, and atag_initrd. For example, the Code for setting atag_core is as follows:
Params = (struct tag *) boot_params;
Params-> HDR. Tag = atag_core;
Params-> HDR. size = tag_size (tag_core );
Params-> U. Core. Flags = 0;
Params-> U. Core. pagesize = 0;
Params-> U. Core. rootdev = 0;
Params = tag_next (Params );
Boot_params indicates the starting base address of the kernel startup parameter in the memory. The pointer Params is a pointer of the struct tag type. Macro tag_next () calculates the starting address of the next Tag next to the current tag Based on the pointer pointing to the current tag as the parameter. Note that the device ID of the kernel root file system is set here.
The following is a sample code for setting memory ing:
For (I = 0; I <num_mem_areas; I ++ ){
If (memory_map. Used ){
Params-> HDR. Tag = atag_mem;
Params-> HDR. size = tag_size (tag_mem32 );
Params-> U. mem. Start = memory_map. Start;
Params-> U. mem. size = memory_map. Size;
Params = tag_next (Params );
}
}
It can be seen that in the memory_map [] array, each valid memory segment corresponds to an atag_mem parameter mark.
The Linux kernel can receive information in the form of command line parameters at startup. With this, we can provide the kernel with hardware parameter information that cannot be detected by itself, or override) information detected by the kernel itself. For example, we use this command line parameter string "console = ttys0, 115200n8" to notify the kernel to use ttys0 as the console, the serial port uses the "115200bps, no parity, 8-Bit Data bit" settings. The following is a sample code that sets the string for calling the kernel command line parameter:
Char * P;
/* Eat leading white space */
For (P = CommandLine; * P = ''; P ++)
;
/* Skip non-existent command lines so the kernel will still
* Use its default command line.
*/
If (* P = '/0 ')
Return;
Params-> HDR. Tag = atag_cmdline;
Params-> HDR. size = (sizeof (struct tag_header) + strlen (p) + 1 + 4)> 2;
Strcpy (Params-> U. marshline. marshline, P );
Params = tag_next (Params );
Note that in the above Code, when setting the tag_header size, it must include the terminator '/0' of the string, and the number of bytes must be rounded up to four bytes, because the size member in the tag_header structure represents the number of words.
The following is the sample code for setting atag_initrd. It tells the kernel where the initrd image (compression format) and its size can be found in Ram:
Params-> HDR. Tag = atag_initrd2;
Params-> HDR. size = tag_size (tag_initrd );
Params-> U. initrd. Start = ramdisk_ram_base;
Params-> U. initrd. size = initrd_len;
Params = tag_next (Params );
The following is the sample code for setting atag_ramdisk. It tells the kernel how large the ramdisk is after decompression (unit: KB ):
Params-> HDR. Tag = atag_ramdisk;
Params-> HDR. size = tag_size (tag_ramdisk );
Params-> U. ramdisk. Start = 0;
Params-> U. ramdisk. size = ramdisk_size;/* Note that the unit is KB */
Params-> U. ramdisk. Flags = 1;/* automatically load ramdisk */
Params = tag_next (Params );
Finally, set the atag_none flag to end the entire startup parameter list:
Static void setup_end_tag (void)
{
Params-> HDR. Tag = atag_none;
Params-> HDR. size = 0;
}
3.2.5 call the kernel
The method for the boot loader to call the Linux kernel is to directly jump to the First Command of the kernel, that is, directly jump
To the mem_start + 0x8000 address. When redirecting, the following conditions must be met:
1. CPU register settings:
R0 = 0;
R1 = Machine Type ID. For more information about machine type number, see Linux/ARCH/ARM/tools/MAC.
H-types.
R2 = start parameter flag list start base address in Ram;
2. CPU mode:
Interruption must be prohibited (irqs and fiqs );
The CPU must be in SVC mode;
3. cache and MMU settings:
MMU must be disabled;
Command cache can be enabled or disabled;
Data Cache must be disabled;
If you use the C language, you can call the kernel like the following sample code:
Void (* Thekernel) (INT zero, Int Arch, u32 params_addr) = (void (*) (INT, Int,
U32) kernel_ram_base;
......
Thekernel (0, arch_number, (u32) kernel_params_start );
Note that Thekernel () function calls should never be returned. If this call returns, it indicates an error.
4. About serial port terminals
In the design and implementation of the boot loader program, nothing is more exciting than receiving the printed information correctly from the serial port terminal. In addition, printing information to the serial port terminal is also an important and effective debugging method. However, we often encounter serial port terminal display garbled or not displayed at all. There are two main reasons for this problem: (1) the boot loader has incorrect initialization settings for the serial port. (2) The terminal simulation program running on the host end does not correctly set the serial port, including the baud rate, parity, data bit and stop bit.
In addition, sometimes we encounter this problem: During the boot loader running process, we can correctly output information to the serial port terminal, however, after boot loader starts the kernel, the startup output of the kernel cannot be seen. The reasons for this problem can be considered from the following aspects:
(1) first, check that your kernel is configured with support for the serial port terminal during compilation, and configure the correct serial port driver.
Program.
(2) your boot loader's initialization settings for the serial port may be inconsistent with the kernel's initialization settings for the serial port. In addition, for a CPU such as b0x, the CPU clock frequency setting also affects the serial port. Therefore, if the boot loader and the kernel do not set the CPU clock frequency, it also makes the serial port terminal unable to correctly display information.
(3) Finally, make sure that the kernel base address used by the boot loader must be consistent with the running base address used by the kernel image during compilation, especially for uClinux. Assume that the base address used for compiling your kernel image is 0xc0008000, but your boot loader loads it to 0xc0010000 for execution, then the kernel image cannot be correctly executed.
5. Conclusion
The Design and Implementation of boot loader is a very complicated process. If you cannot receive the exciting "Uncompressing Linux ...... done, booting the kernel ...... "Kernel startup information, I am afraid no one can say:" Hi, my boot loader has been successfully transferred! ".
About the author
Zhan Rongkai is interested in Embedded Linux, Linux kernel, drivers, and file systems. You can
Connect him through a zhanrk@sohu.com.