This article was reproduced from: https://www.ibm.com/developerworks/cn/linux/l-btloader/
In this paper, the concept of ――boot Loader, the main tasks of software design and the structure framework of the OS Launcher loader based on embedded system are introduced in detail.
1. Introduction
Running Gnu/linux systems on dedicated embedded boards has become increasingly popular. An embedded Linux system can usually be divided into four levels from a software perspective:
1. boot loader. includes the boot code (optional) that is cured in the firmware (firmware), and the boot Loader of the two parts.
2. Linux kernel. custom kernel and kernel startup parameters specific to the embedded board.
3. file System. includes the root file system and the file system built on the Flash memory device. Usually use RAM disk as root FS.
4. user Application. user-specific applications. Sometimes an embedded graphical user interface may also be included between the user application and the kernel layer. Common embedded GUI is: Microwindows and MiniGUI understand.
The bootloader is the first piece of software code that runs after the system is power-up. Recall the architecture of the PC we know that the bootloader in the PC is made up of the BIOS (which is essentially a piece of firmware) and the OS Boot Loader (for example, LILO and GRUB) in the hard disk MBR. After completing hardware detection and resource allocation, the BIOS reads Boot Loader from the hard disk MBR into the system's RAM and then gives control to OS boot Loader. The main running task of boot Loader is to read the kernel image from the hard disk into RAM and then jump to the kernel entry point to run, starting the operating system.
In the embedded system, there is usually no firmware program like BIOS (note, some embedded CPU will also embed a short startup program), so the entire system load start task is entirely done by the boot Loader. For example, in an embedded system based on ARM7TDMI core, the system usually executes from address 0x00000000 when power-up or reset, and the system's Boot Loader program is usually arranged at this address.
This article will discuss the boot Loader of the embedded system from the concept of boot Loader, the main task of boot Loader, the frame structure of boot Loader and the installation of boot Loader.
Back to top of page
2. The concept of Boot Loader
Simply put, Boot Loader is a small program that runs before the operating system kernel runs. With this applet, we can initialize the hardware device and establish a mapping of the memory space to bring the system's hardware and software environment to a suitable state to prepare the correct environment for the final call to the operating system kernel.
Typically, Boot Loader is heavily dependent on hardware, especially in the embedded world. Therefore, it is almost impossible to build a generic Boot Loader in the embedded world. Nonetheless, we can still generalize some common concepts to boot Loader to guide the user-specific boot Loader design and implementation.
1. Boot Loader supported CPUs and embedded boards
Each of the different CPU architectures has a different Boot Loader. Some Boot Loader also support multiple architectures of CPUs, such as U-boot, which support both ARM architecture and MIPS architecture. In addition to CPU-dependent architectures, Boot Loader actually relies on the configuration of specific embedded board-level devices. This means that, for two different embedded boards, even if they are built on the same CPU, the boot Loader program running on a board can also run on another board, and it is often necessary to modify the boot Loader's source program.
2. Boot Loader installation media (Installation Medium)
After the system is power-up or reset, all CPUs usually take instructions from an address that is pre-arranged by the CPU manufacturer. For example, a CPU based on ARM7TDMI Core usually takes its first instruction from address 0x00000000 when it is reset. CPU-based embedded systems typically have some type of solid state storage device (such as ROM, EEPROM, or FLASH) mapped to this pre-arranged address. Therefore, after the system is power up, the CPU will first execute the Boot Loader program.
1 is a typical spatial allocation diagram of a solid state storage device with boot Loader, kernel boot parameters, kernel images, and root file system images.
Figure 1 Typical spatial allocation structure of solid state storage devices
3. Device or mechanism used to control Boot Loader
Between the host and the target machine is generally established through the serial port connection, Boot Loader software in the implementation of the usually through the serial port for I/O, such as: output printing information to the serial port, read user control characters from the serial port.
4. Boot Loader is a single stage or multi-stage (multi-stage)
Often, multi-stage Boot Loader can provide more complex functionality and better portability. Boot Loader booting from solid state storage is mostly a 2-stage boot process, that is, the startup process can be divided into Stage 1 and stage 22. And what tasks to accomplish in Stage 1 and Stage 2 are discussed below.
5. Boot Loader Mode of operation (operation mode)
Most boot Loader contain two different modes of operation: the boot load mode and the download mode, which makes sense only for the developer. But from the end user's point of view, the role of boot Loader is to load the operating system, and there is no so-called boot loading mode and download mode of the difference.
boot Loading mode: This mode is also known as the "Autonomous" (autonomous) mode. That is, Boot Loader loads the operating system into RAM from a solid-state storage device on the target, and the process does not involve the user. This mode is the normal mode of boot Loader, so the boot Loader obviously must work in this mode when the embedded product is released.
Download (downloading) mode: in this mode, the Boot Loader on the target will download files from the host computer via a serial connection or network connection, such as downloading the kernel image and the root file system image. Files downloaded from the host are usually first saved to the target's RAM by boot Loader and then written to the Flash class solid state storage device on the target machine by boot Loader. This mode of boot Loader is typically used when installing the kernel and root file system for the first time, and later system updates will use the boot Loader mode of operation. The Boot Loader, which works in this mode, usually provides a simple command-line interface to its end users.
Powerful Boot Loader such as blobs or u-boot typically support both modes of operation, and allow the user to switch between the two modes of operation. For example, the blob is in normal boot load mode at startup, but it will delay 10 seconds to wait for the end user to press any key and switch the blob to download mode. If there are no user keystrokes within 10 seconds, the BLOB continues to boot the Linux kernel.
6. Communication devices and protocols used for file transfer between the BootLoader and the host
The most common case is that the Boot Loader on the target machine transmits the file between the serial port and the host, and the transmission protocol is usually one of the Xmodem/ymodem/zmodem protocols. However, the speed of the serial port transmission is limited, so it is a better choice to connect via Ethernet and to download files with the help of the TFTP protocol.
In addition, in the discussion of this topic, the host side of the software to be considered. For example, when downloading a file over an Ethernet connection and a TFTP protocol, the host Party must have a software used to provide the TFTP service.
After discussing the above concepts of BootLoader, let's take a look at what the BootLoader should do.
3. Boot Loader's main tasks and typical structural framework
Before continuing the discussion in this section, let's start by assuming that the kernel image and the root file system image are loaded into RAM to run. The assumption is that the kernel image and the root file system image in an embedded system can be directly run directly in a solid state storage device such as ROM or Flash. But this approach is undoubtedly at the expense of the speed of operation.
From the operating system's point of view, the overall goal of the Boot Loader is to correctly invoke the kernel to execute.
In addition, since the implementation of the boot Loader relies on the CPU architecture, most boot Loader are divided into Stage1 and stage2. Code that relies on CPU architecture, such as device initialization code, is usually placed in stage1, and is usually implemented in assembly language to achieve a short and concise purpose. Stage2 is usually implemented in C, which allows for complex functions, and the code is more readable and portable.
The stage1 of Boot Loader usually consists of the following steps (in order of execution):
- Initialization of hardware devices.
- Prepare RAM space for stage2 loading Boot Loader.
- Copy the Boot Loader stage2 into RAM space.
- Set up the stack.
- Jump to the C entry point of Stage2.
The stage2 of Boot Loader usually consists of the following steps (in order of execution):
- Initializes the hardware device to be used at this stage.
- Detects the system memory map.
- Read the kernel image and root file system image from Flash to RAM space.
- Set the startup parameters for the kernel.
- Call the kernel.
3.1 Boot Loader Stage1
3.1.1 Basic Hardware Initialization
This is the start of the Boot Loader operation, which is intended to prepare some basic hardware environment for stage2 execution and subsequent kernel execution. It usually consists of the following steps (in order of execution):
1. all interrupts are blocked. servicing interrupts is usually the responsibility of the OS device driver, so there is no need to respond to any interrupts during the execution of the Boot Loader. Interrupt masking can be done by writing interrupt screen registers or status registers (such as ARM's CPSR registers) of the CPU.
2. sets the speed and clock frequency of the CPU.
3. RAM initialization. This includes the ability to correctly set the memory controller's function register, as well as the memory library control registers and so on.
4. Initialize the LEDs. Typically, the LED is driven by a GPIO to indicate whether the state of the system is OK or Error. If there is no LED on the board, you can also do this by initializing the UART to print the Boot Loader's Logo character information to the serial port.
5. close the CPU internal instruction/data cache.
3.1.2 Preparing RAM space for loading stage2
For faster execution, it is common to load Stage2 into RAM space to execute, so you must prepare a usable amount of RAM space for the stage2 that loads Boot Loader.
Since stage2 is usually the C language code execution, it is necessary to take the stack space into account in addition to the size of the Stage2 executable image when considering the size of the space. In addition, the size of the space is preferably a multiple of the memory page size (usually 4KB). Generally, 1M of RAM space is sufficient. The specific address range can be arbitrarily arranged, such as a blob that arranges its stage2 executable image to execute in 1M space starting from the system RAM start address 0xc0200000. However, it is a recommended approach to schedule Stage2 to the top 1MB (i.e. (RAMEND-1MB)-ramend) of the entire RAM space.
For the convenience of the following narrative, the size of the scheduled RAM space is recorded as: Stage2_size (bytes), starting and ending addresses are recorded as: Stage2_start and Stage2_end (these two addresses are aligned at 4-byte boundaries). So:
Stage2_end=stage2_start+stage2_size
In addition, you must ensure that the address range you are arranging is indeed a writable RAM space, so you must test the range of addresses you have scheduled. The specific test method can be similar to the Blob-like method, that is, the memory page is the test unit, test each memory page start two characters are read-write. For the convenience of the following narrative, we remember that the detection algorithm is: Test_mempage, the specific steps are as follows:
1. First, save the contents of the first two words of the memory page.
2. Write any number to the two words. For example: write to the first word 0x55, the 2nd Word write 0xaa.
3. Then, immediately read the contents of the two words back. Obviously, the content we read should be 0x55 and 0XAA, respectively. If not, the address range occupied by the memory page is not a valid amount of RAM space.
4. Write any number to the two words again. For example: write to the first word 0xaa, the 2nd Word write 0x55.
5. Then, immediately read the contents of the two words back. Obviously, the content we read should be 0XAA and 0x55, respectively. If not, the address range occupied by the memory page is not a valid amount of RAM space.
6. Restores the original content of the two words. The test is complete.
In order to get a clean amount of RAM space, we can also clear 0 of the planned Ram space range.
3.1.3 Copy Stage2 to RAM
To make a copy, determine two points: (1) The executable image of the stage2 on the storage start address and the terminating address of the SSD, and (2) The starting address of the RAM space.
3.1.4 Set stack pointer sp
The stack pointer is set to be ready to execute the C language code. Usually we can set the SP value to (STAGE2_END-4), which is the top of the 1MB RAM space (stack down) that is arranged in section 3.1.2.
In addition, before setting the stack pointer sp, you can also turn off the LED lights to prompt the user we are ready to jump to stage2.
After these steps, the physical memory layout of the system should be as shown in 2.
3.1.5 Jump to the C entry point of Stage2
After all of the above is ready, you can jump to Boot Loader stage2 to execute. For example, in an ARM system, this can be done by modifying the PC register to the appropriate address.
Figure 2 The system memory layout of the bootloader stage2 executable image when it was just copied to RAM space
3.2 Boot Loader Stage2
As previously mentioned, Stage2 's code is typically implemented in C, which allows for more complex functionality and better code readability and portability. However, unlike a normal C language application, we cannot use any of the supporting functions in the GLIBC library when compiling and linking a program such as boot loader. The reason for this is obvious. This brings us to the question of jumping into the main () function from there. It is perhaps the most straightforward idea to directly place the starting address of the main () function as the entry point for the entire stage2 execution image. However, there are two drawbacks to this: 1) The function parameter cannot be passed through the main () function, 2) cannot handle the return of the main () function. A more ingenious approach is to use the concept of trampoline (Spring bed). That is, the assembly language to write a paragraph trampoline small program, and this section of the trampoline applet as the Stage2 executable image execution entry point. We can then jump into the main () function with the CPU jump instruction in the trampoline assembly applet, and when the main () function returns, the CPU execution path is obviously back to our trampoline program again. In short, the idea of this approach is to use this trampoline applet as the outer parcel of the main () function (external wrapper).
A simple example of a trampoline program (from a BLOB) is given below:
. text.globl _trampoline_trampoline:blmain/* If main ever returns we just call it again */b_trampoline
As you can see, when the main () function returns, we re-execute the trampoline program with a jump instruction-and of course re-executes the main () function, which is the meaning of the word trampoline (Spring bed).
3.2.1 Initializing the hardware device to be used at this stage
This usually includes: (1) initializing at least one serial port for I/O output information with the end user, (2) initializing the timer, etc.
Before initializing these devices, you can also re-light the LEDs to indicate that we have entered the main () function.
After the initialization of the device, you can output some printing information, program name string, version number and so on.
Memory map of the 3.2.2 Detection system
The so-called memory mapping refers to the entire 4GB physical address space in which address ranges are allocated to the system's RAM units. For example, in the SA-1100 CPU, the 512M address space starting from 0xc000,0000 is used as the RAM address space for the system, while in the Samsung s3c44b0x CPU, 64 from 0x0c00,0000 to 0x1000,0000 The M address space is used as the RAM address space for the system. Although the CPU usually reserves a large amount of address space to the system RAM, but in the construction of a specific embedded system does not necessarily implement the CPU reserved for all the RAM address space. In other words, the specific embedded system often maps only a portion of the total RAM address space reserved by the CPU to the Ram unit, leaving the remaining portion of the reserved RAM address space in an unused state. because of this fact, the Boot Loader Stage2 must detect the memory mapping of the entire system before it wants to do something (for example, to read the kernel image stored in Flash into RAM space), i.e. it must know all the RAM address space reserved by the CPU Which are really mapped to the RAM address unit and which are in the "unused" state.
(1) Description of Memory map
You can use the following data structure to describe a contiguous (continuous) range of addresses in the RAM address space:
typedef struct MEMORY_AREA_STRUCT {u32 start;/* The base address of the memory region */u32 size;/* The byte number of T He memory region */int used;} memory_area_t;
The contiguous address range in this RAM address space can be in one of two states: (1) used=1, which means that the contiguous range of addresses has been implemented, which is really mapped to the RAM unit. (2) Used=0, it indicates that this continuous range of addresses is not implemented by the system, but is in an unused state.
Based on the above memory_area_t data structure, the RAM address space reserved by the entire CPU can be represented by an array of memory_area_t types, as follows:
memory_area_t Memory_map[num_mem_areas] = {[0 ... (NUM_MEM_AREAS-1)] = {. Start = 0,.size = 0,.used = 0},};
(2) memory-mapped detection
Here we give a simple and efficient algorithm that can be used to detect the memory mapping of the entire RAM address space:
/* array initialization */for (i = 0; i < Num_mem_areas; i++) memory_map[i].used = 0;/* First write a 0 to all Memory Locations */for (addr = mem_start; addr < mem_end; addr + = page_size) * (u32 *) addr = 0;for (i = 0, addr = Mem_star T Addr < Mem_end; Addr + = page_size) {/* * detects whether the address space of size * page_size is a valid RAM address space starting from the base address mem_start+i*page_size. */Invoke the algorithm Test_mempage () in section 3.1.2, if (current memory page isnot a valid RAM page) {/* No RAM here */if (Memory_map[i] . used) I++;continue;} /* * The current page is already a valid address range mapped to RAM * but also to see if the current page is just an alias for an address page in the 4GB address space? */if (* (U32 *) addr! = 0) {/* alias? *//* this memory page is an alias for an address page in the 4GB address space */if (memory_map[i].used) i++;continue;} /* * The current page is already a valid address range mapped to RAM * and it is not an alias for an address page in the 4GB address space. */if (memory_map[i].used = = 0) {Memory_map[i].start = Addr;memory_map[i].size = page_size;memory_map[i].used = 1;} else {m Emory_map[i].size + = Page_size;}} /* End of For (...) */
After using the above algorithm to detect the memory mapping of the system, Boot Loader can also print the details of the memory map to the serial port.
3.2.3 Loading kernel images and root file system images
(1) Layout of memory footprint planning
This includes two aspects: (1) the memory range occupied by the kernel image, and (2) the memory range occupied by the root filesystem. When planning for memory-occupied layouts, the main considerations are two aspects of the base address and the size of the image.
For kernel images, it is generally copied to a range of approximately 1MB of memory from the base site (mem_start+0x8000) (embedded Linux cores are generally not 1MB). Why do you want to empty the 32KB-size memory from Mem_start to mem_start+0x8000? This is because the Linux kernel will place some global data structures in this memory, such as: Startup parameters and Kernel page tables.
For the root file system image, it is generally copied to the place where mem_start+0x0010,0000 started. If you use Ramdisk as the root file system image, the uncompressed size is typically 1MB.
(2) Copy from Flash
Since embedded CPUs such as ARM typically address solid-state storage devices such as Flash in a unified memory addressing space, reading data from Flash is no different from reading data from a RAM unit. With a simple loop you can do the work of copying images from a Flash device:
while (count) {*dest++ = *src++;/* They is all aligned with word boundary */count-= 4;/* byte number */};
3.2.4 Setting the kernel's startup parameters
It should be said that after you copy the kernel image and the root file system image into RAM space, you are ready to start the Linux kernel. However, before calling the kernel, you should make a step of preparation: Set the startup parameters of the Linux kernel.
The kernel of Linux 2.4.x expects to pass the startup parameters in the form of a tag list (tagged list). Start the parameter marker list to mark Atag_core to start with the mark Atag_none end. Each tag consists of the Tag_header structure that identifies the passed parameter and the subsequent parameter value data structure. The data structure tag and tag_header are defined in the Include/asm/setup.h header file of the Linux kernel source code:
/* The list ends with an atag_none node. */#define ATAG_NONE0X00000000STRUCT Tag_header {u32 size; */* note, here size is the number of words for the */u32 tag;}; ... struct tag {struct Tag_header hdr;union {struct tag_corecore;struct tag_mem32mem;struct tag_videotextvideotext; struct Tag_ramdiskramdisk;struct tag_initrdinitrd;struct tag_serialnrserialnr;struct tag_revisionrevision;struct Tag_videolfbvideolfb;struct tag_cmdlinecmdline;/* * Acorn specific */struct tag_acornacorn;/* * DC21285 Specific * * struct TAG_MEMCLKMEMCLK;} u;};
In embedded Linux systems, common boot parameters that are normally required to be set by boot Loader are: Atag_core, Atag_mem, Atag_cmdline, Atag_ramdisk, ATAG_INITRD, and so on.
For example, the code for setting up Atag_core is as follows:
params = (struct tag *) Boot_params;params->hdr.tag = Atag_core;params->hdr.size = Tag_size (tag_core);p arams-> U.core.flags = 0;params->u.core.pagesize = 0;params->u.core.rootdev = 0;params = Tag_next (params);
Where Boot_params represents the starting base address of the kernel boot parameter in memory, and the pointer PARAMS is a pointer to a struct tag type. Macro Tag_next () calculates the starting address of the next tag that is close to the current tag, with a pointer to the current tag as a parameter. Note that the device ID where the kernel's root filesystem resides is set here.
The following is a sample code that sets the memory mapping situation:
for (i = 0; i < Num_mem_areas; i++) {if (memory_map[i].used) {Params->hdr.tag = Atag_mem;params->hdr.size = Tag_siz E (tag_mem32);p Arams->u.mem.start = memory_map[i].start;params->u.mem.size = Memory_map[i].size;params = Tag_ Next (params);}}
As you can see, in the memory_map[] array, each valid memory segment corresponds to a atag_mem parameter marker.
The Linux kernel can receive information in the form of command-line arguments at boot time, which we can use to provide the kernel with hardware parameter information that the kernel cannot detect itself, or the overloaded (override) kernel to detect its own information. For example, we use a command line argument string "Console=ttys0,115200n8" to notify the kernel to ttyS0 as the console, and the serial port with "115200bps, no parity, 8 bit data bit" such settings. The following is a sample code that sets the kernel command-line parameter string to be called:
Char *p;/* eat leading white space */for (p = commandline; *p = = "; p++);/* Skip non-existent command lines so the kernel Would still * use it default command line */if (*p = = ') Return;params->hdr.tag = atag_cmdline;params->hdr.si Ze = (sizeof (struct tag_header) + strlen (p) + 1 + 4) >> 2;strcpy (Params->u.cmdline.cmdline, p);p arams = Tag_next ( params);
Note that in the above code, when you set the size of the Tag_header, you must include the terminator of the string '% ', and also the number of bytes rounded up to 4 bytes, because the size member in the Tag_header structure represents the word count.
Here is the sample code for setting up ATAG_INITRD, which tells the kernel where to find the INITRD image (compressed format) and its size in RAM:
Params->hdr.tag = Atag_initrd2;params->hdr.size = Tag_size (tag_initrd);p Arams->u.initrd.start = RAMDISK_RAM _base;params->u.initrd.size = Initrd_len;params = Tag_next (params);
Here is the sample code for setting up Atag_ramdisk, which tells the kernel how large the RAMDISK is after decompression (in kilobytes):
Params->hdr.tag = Atag_ramdisk;params->hdr.size = Tag_size (tag_ramdisk);p Arams->u.ramdisk.start = 0;params- >u.ramdisk.size = ramdisk_size; */* Please note that the unit is KB */params->u.ramdisk.flags = 1; /* Automatically load RAMDisk */params = Tag_next (params);
Finally, set the atag_none tag to end the entire list of startup parameters:
static void Setup_end_tag (void) {Params->hdr.tag = Atag_none;params->hdr.size = 0;}
3.2.5 calling the kernel
The Boot Loader calls the Linux kernel by jumping directly to the first command of the kernel, which jumps directly to the mem_start+0x8000 address. When jumping, the following conditions should be met:
1. Settings for the CPU registers:
- r0=0;
- r1= machine Type ID, refer to linux/arch/arm/tools/mach-types.
- r2= Start parameter tag list start base address in RAM;
2. CPU mode:
- Interruption (IRQs and Fiqs) must be prohibited;
- The CPU must be in SVC mode;
3. Settings for the Cache and MMU:
- The MMU must be closed;
- The command Cache can be opened or closed;
- The data Cache must be closed;
If you use C, you can call the kernel like the following sample code:
void (*thekernel) (int zero, int arch, u32 params_addr) = (void (*) (int, int, u32)) Kernel_ram_base;......thekernel (0, Arch _number, (U32) kernel_params_start);
Note that the Thekernel () function call should never be returned. If this call returns, an error is indicated.
Back to top of page
4. About Serial terminal
In the design and implementation of the boot loader program, nothing can be more exciting than receiving the printed information correctly from the serial terminal. In addition, the printing of information to the serial terminal is also a very important and effective debugging means. However, we often encounter the serial terminal display garbled or not show the problem at all. There are two main reasons for this problem: (1) boot loader initialization of the serial port is not correct. (2) The terminal emulator running on the host side has incorrect settings for the serial port, which includes settings such as baud rate, parity check, data bit and stop bit.
In addition, sometimes you encounter this problem, that is: in the boot loader operation process we can correctly output information to the serial terminal, but when boot loader boot kernel can not see the boot output information of the kernel. The reasons for this problem can be considered in the following ways:
(1) First make sure that your kernel is configured to support the serial terminal at compile time, and the correct serial port driver is configured.
(2) Your boot loader initialization settings for the serial port may be inconsistent with the initialization settings of the kernel for the serial port. In addition, the setting of the CPU,CPU clock frequency such as s3c44b0x also affects the serial port, so if the boot loader and the kernel have inconsistent settings for their CPU clock frequency, the serial terminal will not display the information correctly.
(3) Finally, it is also necessary to confirm that the kernel base address used by the boot loader must be consistent with the base site at which the kernel image was compiled, especially for UClinux. Assuming that your kernel image is compiled with a base address of 0xc0008000, but your boot loader loads it into 0xc0010000, the kernel image will not execute correctly.
Back to top of page
5. Concluding remarks
The design and implementation of Boot Loader is a very complex process. If you cannot receive the exciting "uncompressing Linux ... done" from the serial port, booting the kernel ... "Kernel boot message, I'm afraid no one can say:" Hi, my boot loader. ". has been successfully turned up! "。
Embedded system Boot Loader Technology Insider "turn"