Linux driver Development must see-linux START process (turn) __linux

Source: Internet
Author: User
Tags reserved advantage

Reproduced Original: http://blog.chinaunix.net/uid-12461657-id-3199784.html


Linux Driver Development must look

Detailed explanation The mysterious kernel is reproduced in http://www.it168.com Source: Chinaunix Author: Chinaunix

"IT168 Technical documents " before starting into the mysterious world of Linux device drivers, let's look at several kernel elements from the driver developer's perspective, and familiarize ourselves with some basic kernel concepts. We will learn about kernel timers, synchronization mechanisms, and memory allocation methods. However, we have to start from the beginning of this journey of exploration. Therefore, this chapter first takes a look at the startup information emitted by the kernel, and then explains some interesting points one by one.

  2.1 Startup Process

Figure 2-1 shows the boot sequence based on the x86 computer Linux system. The first step is that the BIOS imports the master boot record (MBR) from the boot device, and then the code in the MBR looks at the partition table and reads the boot loader, such as GRUB, Lilo, or syslinux from the active partition, after which the boot loader loads the compressed kernel image and passes control to it. After the kernel has gained control, it will decompress itself and put it into operation.

There are two modes of operation for x86 based processors: Real mode and protection mode. In real mode, users can use only 1 MB of memory and have no protection. Protection mode is much more complex, and users can use more advanced features such as paging. The CPU must switch real mode to protected mode halfway. However, this switch is one-way, that is, you cannot switch back to real mode from protection mode.

The first step in kernel initialization is to execute the assembly code in real mode, followed by the Start_kernel () function in the protected mode init/main.c file (the source file modified in the previous chapter). The Start_kernel () function First initializes the CPU subsystem, then lets the memory and process management system in place, starts the external bus and I/O devices, and the final step is to activate the initialization (INIT) program, which is the parent process for all Linux processes. The initialization process performs a user-space script that initiates the necessary kernel services, and ultimately derives the console terminal program and displays the login prompt.

Figure 2-1 Boot process based on Linux on x86 hardware

The level 3 headings in this section are a printed piece of information from figure 2-2, which comes from the Linux boot process of a x86 notebook computer. If you start the kernel on another architecture, the message and semantics may be different.

  2.1.1 bios-provided physical RAM map

The kernel resolves the system memory mappings read from the BIOS and takes the lead in printing the following information:

bios-provided Physical RAM Map:

bios-e820:0000000000000000-000000000009f000 (usable)

...

bios-e820:00000000ff800000-0000000100000000 (Reserved)

The initialization code in real mode obtains the memory-mapped information of the system by using the Int 0x15 service of the BIOS and executing the 0xe820 function (that is, the bios-e820 string above). Memory-mapped information contains reserved and available memory, and the kernel will then use that information to create its available pool of memory. In Appendix B's B.1 section, we will give a more in-depth explanation of the memory mapping issues provided by the BIOS.

Figure 2-2 Kernel Boot information

  2.1.2 758MB Lowmem available

Regular addressable memory areas within 896 MB are called low-end memory. The memory allocation function Kmalloc () allocates memory from this area. Memory areas higher than 896 MB are referred to as high-end memory and can only be accessed when mapped in a special way.

During the boot process, the kernel calculates and displays the total number of pages in these memory areas.

  2.1.3 Kernel command Line:ro root=/dev/hda1

Linux boot loader usually passes a command line to the kernel. The parameters in the command line are similar to the argv[] list passed to the main () function in the C program, except that they are passed to the kernel. You can add command-line arguments to the configuration file of the boot loader, and of course, you can modify the prompt line [1] of the bootstrapper loader during the run. If you are using the Grub boot loader, the configuration file may be/boot/grub/grub.conf or/boot/grub/menu.lst, depending on the release version. If you are using LILO, the configuration file is/etc/lilo.conf. Here is an example of a grub.conf file (added some comments), and after reading the line of code immediately following the title kernel 2.6.23, you will understand the origin of the printed information.

Default 0 #Boot The 2.6.23 kernel by default

Timeout 5 #5 second to alter boot order or parameters

Title Kernel 2.6.23 #Boot Option 1

#The boot image resides in the partition

#under The/boot/directory and is named vmlinuz-2.6.23. ' Ro '

#indicates that the root partition should is mounted read-only.

Kernel (hd0,0)/boot/vmlinuz-2.6.23 ro root=/dev/hda1

#Look under section "Freeing Initrd memory:387k freed"

INITRD (hd0,0)/BOOT/INITRD

#...

The command-line arguments affect the code execution path during startup. For example, suppose a command-line argument is Bootmode, and if the parameter is set to 1, it means that you want to print some debugging information during startup and switch to level 3rd at the end of the startup (the initialization of the process is printed to understand the meaning of the runlevel); If the Bootmode parameter is set to 0, it means you want the startup process to be relatively concise and set RunLevel to 2. Now that you are familiar with the Init/main.c file, add the following modifications to the file: static unsigned int bootmode = 1;
static int __init
Is_bootmode_setup (char * str)
{
Get_option (& str, & Bootmode);
return 1;
}

* * Handle parameter "bootmode=" * *
__setup ("bootmode=", Is_bootmode_setup);

if (bootmode) {
/* Print VERBOSE output * *
/* ... */
}

/* ... */

/* If Bootmode is 1, choose a init runlevel of 3, else
Switch to a run level of 2 * *
if (bootmode) {
argv_init[+ + args] = "3";
} else {
argv_init[+ + args] = "2";
}

/* ... */

Please recompile the kernel and try to run the new changes.

  2.1.4 Calibrating delay ... 1197.46 bogomips (lpj=2394935)

During the boot process, the kernel calculates the number of times that the processor runs an internal delay loop within a jiffy time. The meaning of jiffy is the interval between 2 consecutive beats of the system timer. As expected, the calculation must be calibrated to the processing speed of the CPU used. The results of the calibration are stored in kernel variables called Loops_per_jiffy. One case of using Loops_per_jiffy is when a device driver expects a small microsecond-level delay.

To understand the delay-loop calibration code, let's look at the Calibrate_ delay () function defined in the init/calibrate.c file. The function uses an integer operation flexibly to get the floating-point precision. The following code fragment (some comments) shows the beginning of the function, which is used to get a rough value for a loops_per_jiffy: Loops_per_jiffy = (1 << 12); /* Initial approximation = 4096 * *
PRINTK (kern_debug "calibrating delay loop ...");
while ((Loops_per_jiffy <<= 1)!= 0) {
Ticks = jiffies; /* As you'll find out in the section, "Kernel
Timers, "The jiffies variable contains the
Number of timer ticks since the kernel
Started, and is incremented in the timer
Interrupt Handler * *

while (ticks = = jiffies); /* Wait until the start of the next jiffy * *
Ticks = jiffies;
* Delay * *
__delay (Loops_per_jiffy);
/* Did The outlast the current jiffy? Continue if it didn ' t * *
Ticks = jiffies-ticks;
if (ticks) break;
}

Loops_per_jiffy >>= 1; /* This fixes the most significant bit and is
The Lower-bound of Loops_per_jiffy * *

The above code first assumes that the Loops_per_jiffy is greater than 4096, which translates to a processor speed of about 1 million instructions per second, or 1 MIPS. Next, it waits for the Jiffy to be refreshed (1 new beats start) and starts to run the delay Loop __delay (Loops_per_jiffy). If this delay loop lasts more than 1 jiffy, the previous Loops_per_jiffy value (moving the current value to the right 1 bit) will be repaired to the highest bit of the current loops_per_jiffy; otherwise, the function continues to move through the left Loops_per_ Jiffy value to detect its highest bit. After the kernel calculates the highest position, it begins to compute the low and fine-tune its precision: Loopbit = Loops_per_jiffy;

/* Gradually work on the lower-order bits * *
while (Lps_precision-&& (loopbit >>= 1)) {
  loops_per_jiffy |= Loopbit;
  ticks = jiffies;
   while (ticks = = jiffies)/* Wait until the start of the next jiffy/
ticks = jiffies;

  /* Delay */
  __delay (loops_per_jiffy);

   if (jiffies!= ticks)         /* longer than 1 tick/
&nbs P;   loops_per_jiffy &= ~ loopbit;
}

The above code calculates the low value of the loops_per_jiffy when the delay loop crosses the Jiffy boundary. This calibrated value can be used to get the bogomips (in fact, it is not a scientific processor speed metric). You can use bogomips as a relative measure of how fast the processor is running. On a notebook computer with 1.6G Hz based on Pentium M, the result of the cyclic calibration is that the Loops_per_jiffy value is 2394935, based on the printed information of the preceding boot process. The way to obtain Bogomips is as follows: Bogomips = Loops_per_jiffy * Number of Jiffy in 1 seconds * Number of instructions to delay cycle consumption (in millions)
= (2394935 * HZ * 2)/(1000000)
= (2394935 * 250 * 2)/(1000000)
= 1197.46 (consistent with values in the printing information of the startup process)

Jiffy, Hz, and Loops_per_jiffy are described in more detail in section 2.4.

 2.1.5 Checking hlt instruction

Because the Linux kernel supports multiple hardware platforms, startup code checks for architecture-related bugs. One of the tasks is to verify the downtime (HLT) instructions.

The x86 processor's HLT directive will place the CPU in a low-power sleep mode until the next hardware outage occurs. When the kernel wants the CPU to go idle (see the Cpu_idle () function defined in the arch/x86/kernel/process_32.c file), it uses the HLT directive. For problematic CPUs, command line arguments no-hlt can suppress hlt directives. If the no-hlt is set, the kernel will be busy waiting instead of cooling the CPU through HLT when idle.

This information is printed when the startup code in INIT/MAIN.C calls the check_bugs () defined in Include/asm-your-arch/bugs.h.

 2.1.6 net:registered Protocol family 2

The Linux sockets Layer is a unified interface for user-space applications to access various network protocols. Each protocol is registered with its unique serial number, as defined in the Include/linux/socket.h file. Family 2 of the above printed information represents af_inet (Internet Protocol).

Another common registration protocol series during startup is Af_netlink (Family 16). Network link sockets provide a way to communicate between the user process and the kernel. The functionality that can be accomplished through network link sockets also includes access to routing tables and Address Resolution Protocol (ARP) tables (the Include/linux/netlink.h file gives a complete list of usages). For such tasks, network link sockets are more appropriate than system calls because they have the advantage of adopting asynchronous mechanisms, easier implementation, and dynamic linking.

Another protocol series that is often enabled in the kernel is Af_unix or unix-domain sockets. Programs such as X windows use them to communicate between processes on the same system.

  2.1.7 freeing Initrd memory:387k freed

INITRD is a virtual disk image of resident memory loaded by the boot loader. After the kernel is started, it is mounted as the initial root file system, which holds the dynamically connected modules on which the actual root file system partition is mounted. Since the kernel can run on a wide variety of storage controller hardware platforms, it is not feasible to put all possible disk drivers directly into the underlying kernel image. The drivers for your system's storage devices are packaged into INITRD, which are loaded before the kernel starts and the actual root file system is mounted. Use the MKINITRD command to create a INITRD image.

The 2.6 kernel provides a new feature called Initramfs, which is more outstanding in several ways than INITRD. The latter simulates a disk (which is known as Initramdisk or INITRD), which brings overhead (such as buffering) of the Linux block I/O subsystem, which is essentially the same as a mounted file system, and thus is called Initramfs.

Unlike INITRD, a Initramfs based on page buffering can dynamically grow or shrink as a page buffer, reducing its memory consumption. In addition, INITRD requires that your kernel image contain the file system used by INITRD (for example, if INITRD is a EXT2 file system, the kernel must contain EXT2 drivers), but Initramfs does not require file system support. Furthermore, since Initramfs is only a small layer above the page buffer, its code is small.

The user can package the initial root file system as a Cpio compressed package [1] and pass the initrd= command-line arguments to the kernel. Of course, you can also compile the kernel directly through the Initramfs_source option during kernel configuration. In the latter case, the user can provide a file name for the Cpio package or a directory tree containing Initramfs. During the boot process, the kernel will extract the file into a Initramfs root filesystem, and if it finds a/init, it will execute the top-level program. This method of obtaining the initial root file system is particularly useful for embedded systems because system resources are invaluable in embedded systems. With Mkinitramfs, you can create a INITRAMFS image and view the document Documentation/filesystems/ramfs-rootfs-initramfs.txt for more information.

In this case, we are using the initrd= command-line arguments to pass the initial root file system Cpio compression package to the kernel. After you extract the contents of the compressed package into the root file system, the kernel releases the memory occupied by the compressed package (in this case, 387 KB) and prints the above information. The freed pages are distributed to other parts of the kernel for application.

During the development of embedded systems, INITRD and INITRAMFS can sometimes be used as the actual root file system on embedded devices.

  2.1.8 IO Scheduler anticipatory registered (default)

The main goal of the I/O Scheduler is to increase the throughput of the system by reducing the number of times the disk is positioned. During disk positioning, the head needs to move from its current position to the destination of interest, which can cause some delay. The 2.6 kernel provides 4 different I/O schedulers: Deadline, anticipatory, Complete Fair queuing, and NoOp. Printing information from the kernel above shows that this example sets the anticipatory to the default I/O scheduler.

  2.1.9 Setting up standard PCI

The next stage of the startup process initializes the I/O bus and the perimeter controller. The kernel detects PCI hardware by traversing the PCI bus, and then initializes other I/O subsystems. From Figure 2-3 We will see the SCSI subsystem, USB controller, video chip (part of 855 North Bridge chipset information), serial port (in this case, 8250 UART), PS/2 keyboard and mouse, floppy drive, RAMDisk, loopback device, IDE controller ( This example is part of the ICH4 Bridge chipset, the trackpad, the Ethernet controller (e1000 in this case), and the startup information initialized by the PCMCIA controller. The identity (ID) of the I/O device is pointed to in figure 2-3.

Figure 2-3 Initializing the bus and perimeter controller during the boot process

This book will discuss most of the above driver subsystems in a separate section, and note that if the driver is dynamically linked to the kernel as a module, some of the messages may be displayed only after the kernel has started.

  2.1.10 ext3-fs:mounted filesystem

The EXT3 file system has become the Linux de facto file system. EXT3 adds a log layer based on the retired EXT2 file system, which can be used for fast recovery of file systems after a crash. Its goal is to obtain a consistent file system without time-consuming file system check (FSCK) operations. EXT2 is still the working engine for the new file system, but the EXT3 layer logs the file interaction before the actual disk changes are made. EXT3 is backward compatible with EXT2, so you can add EXT3 to your existing EXT2 file system or return to EXT3 file system from EXT2.

EXT3 will start a kernel worker thread called Kjournald (the next chapter will delve into the kernel thread) to complete the logging function. After the EXT3 is put into operation, the kernel mounts the root file system and prepares for "business":

ext3-fs:mounted filesystem with ordered data mode

Kjournald starting. Commit interval 5 Seconds

vfs:mounted Root (ext3 filesystem).

2.1.11 init:version 2.85 booting

The parent process init of all Linux processes is the 1th program that runs after the kernel completes the startup sequence. In the last few lines of INIT/MAIN.C, the kernel searches for a different location to navigate to Init:if (Ramdisk_execute_command) {/* look For/init in Initramfs * *
Run_init_process (Ramdisk_execute_command);
}

if (Execute_command) {/* you override init and ask the kernel
To execute a custom program using the
"init=" kernel command-line argument. If
Execute_command points to the
Specified program * *
Run_init_process (Execute_command);
}

* Else search for init or sh in the usual places. */
Run_init_process ("/sbin/init");
Run_init_process ("/etc/init");
Run_init_process ("/bin/init");
Run_init_process ("/bin/sh");
Panic ("No init found.") Try passing init= option to kernel. " );

Init will accept the/ETC/INITTAB guidelines. It first executes the system initialization script in/etc/rc.sysinit, one of the most important responsibilities of the script is to activate the swap (swap) partition, which causes the following startup information to be printed:

Adding 1552384k swap On/dev/hda6

Let's take a closer look at the meaning of this passage. The Linux user process has a 3 GB virtual address space (see section 2.7), and the page that makes up the working set is saved in RAM. However, if there are too many programs that require memory resources, the kernel frees some of the RAM pages that are used and stores them in a disk partition called Swap space. According to the rule of thumb, the size of the swap partition should be twice times that of RAM. In this case, the swap space is located in the/dev/hda6 partition, which is 1 552 384 KB in size.

Next, Init starts running the script in the/etc/rc.d/rcx.d/directory, where x is the run level defined in Inittab. RunLevel is the execution state that is entered according to the expected working mode. For example, the multi-user text pattern means that RunLevel is 3,x windows means that RunLevel is 5. So, when you see the message init:entering runlevel 3, Init starts to execute the script in the/etc/rc.d/rc3.d/directory. These scripts start the dynamic device naming subsystem (discussed in chapter 4th), and load the kernel modules for the drivers for the network, audio, and storage devices: Udev

Starting udev: [OK]

Initializing hardware ... network audio storage [done]

...

Finally, Init initiates the Virtual Console terminal and you can log in now.

2.2 kernel mode and user mode

Operating systems such as MS-DOS operate in a single CPU mode, but some Unix-like operating systems use dual-mode, which effectively enables time sharing. On a Linux machine, the CPU is either in a trusted kernel mode or in a restricted user mode. All user processes are running in user mode except that the kernel itself is in kernel mode.

Kernel-mode code has unrestricted access to all processor instruction sets and all memory and I/O space. If a user-mode process is to enjoy this privilege, it must make a request to a device driver or other kernel-mode code through system calls. In addition, user-mode code allows page faults to occur, while kernel-mode code is not allowed.

In the 2.4 and earlier kernels, the only user-mode process can be switched out by the context and preempted by other processes. Kernel-mode code can always monopolize the CPU unless the following two conditions occur:

(1) It voluntarily abandons the CPU;

(2) An interruption or anomaly occurs.

Kernel preemption is introduced into the 2.6 kernel, and most kernel-mode code can be preempted.

 2.3 Process context and interrupt context

The kernel can be in two-seed contexts: The process context and the interrupt context. After a system call, the user application enters the kernel space, and then the kernel space is run on the process context for the representative of the corresponding process in user space. An interrupt that occurs asynchronously causes the interrupt handler to be invoked, and the interrupt handler runs in the interrupt context. The interrupt context and the process context cannot occur at the same time.

Kernel code that runs in the process context is preempted, but the process context is run to the end and will not be preempted. Therefore, the kernel restricts the work of the interrupt context and does not allow it to do the following:

(1) Go to sleep state or take the initiative to give up the CPU;

(2) occupy the mutex;

(3) to perform time-consuming tasks;

(4) Access to user space virtual memory.

A more in-depth discussion of the interrupt context is made in section 4.2 of this book.

 2.4 Kernel Timer

Many parts of the kernel work heavily on time information. The Linux kernel takes advantage of the different timers provided by the hardware to support services such as busy waiting or sleep waiting time. When busy waiting, the CPU will keep running. But when sleep waits, the process discards the CPU. Therefore, the use of the former is considered only when the latter is not feasible. The kernel also provides some convenience to schedule a function to run after a specific time.

Let's first discuss the implications of some important kernel timer variables (jiffies, Hz, and Xtime). Next, we use the Pentium Timestamp counter (TSC) to measure the number of Pentium-based systems. After that, we also analyze how Linux uses the real Time Clock (RTC).

 2.4.1 Hz and Jiffies

The system timer can interrupt the processor at programmable frequencies. This frequency is the number of timer beats per second, which corresponds to the kernel variable Hz. Choosing the appropriate Hz value requires a trade-off. The Hz value is large, the timer interval is small, so the process scheduling accuracy will be higher. However, the greater the Hz value also causes overhead and power consumption, as more processor cycles are consumed in the timer interrupt context. The value of Hz depends on the architecture. On the x86 system, in the 2.4 kernel, the value defaults to 100; In the 2.6 kernel, the value changes to 1000, and in 2. 6.13, it was lowered to 250. On the ARM based platform, the 2.6 kernel sets Hz to 100. In the current kernel, you can select a Hz value from the configuration menu when compiling the kernel. The default value for this option depends on the version of the system architecture.
The 2.6.21 kernel supports a config_no_hz kernel that triggers timer interrupts according to the load of the system. The implementation of the no-beat system is beyond the scope of this chapter and is not discussed in detail.

The jiffies variable records the number of times the system timer has been triggered since the system started. The kernel increases the jiffies variable by the Hz time per second. Therefore, for systems with a Hz value of 100, 1 jiffy equals 10ms, and for systems with Hz 1000, 1 jiffy is only 1ms.

For a better understanding of the Hz and jiffies variables, consider the following code snippet from the IDE driver (drivers/ide/ide.c). This code will always poll the disk drive for busy status: unsigned long timeout = jiffies + (3 * HZ);
while (Hwgroup-> busy) {
/* ... */
if (Time_after (jiffies, timeout)) {
Return-ebusy;
}
/* ... */
}
return SUCCESS;

If the busy condition is cleared within 3s, the above code will return success, otherwise, return-ebusy. 3*hz is the number of jiffies within 3s. The computed timeout jiffies + 3*hz will be the new jiffies value after the 3s timeout. The function of the Time_after () is to compare the current jiffies value to the requested timeout time and detect the overflow. Similar functions include Time_before (), Time_before_eq (), and Time_after_eq ().

Jiffies is defined as the volatile type, and it tells the compiler not to optimize the access code for the variable. This ensures that the timer interrupt handler for each beat can update the Jiffies value, and that each step in the loop will re-read the jiffies value.

For jiffies conversion, you can view the following code fragment in the USB host controller driver DRIVERS/USB/HOST/EHCI-SCHED.C: if (stream-> rescheduled) {
Ehci_info (EHCI, "Ep%ds-iso rescheduled" "%lu Times in%lu
Seconds\n ", Stream->bendpointaddress, is_in?" in ":
"Out", stream-> rescheduled,
((Jiffies–stream-> start)/HZ);
}

The debug statement above calculates the number of seconds that a USB endpoint stream (see Chapter 11th) is stream->rescheduled to be dispatched again. Jiffies-stream->start is the number of jiffies from the beginning to the current consumption, dividing it by Hz to get the second value.

Assuming the jiffies value is 1000, the 32-bit jiffies will overflow for approximately 50 days. Because the system can run many times longer than that, the kernel provides another variable jiffies_64 to hold the 64-bit (U64) jiffies. The linker points the lower 32 bits of the jiffies_64 to the same address as the 32-bit jiffies. On a 32-bit machine, in order to assign a U64 variable to another, the compiler requires 2 instructions, so the read jiffies_64 operation does not have atomicity. You can learn from the cpufreq_stats_update () defined in the DRIVERS/CPUFREQ/CPUFREQ_STATS.C file as an instance.

 2.4.2 Long Delay

In the kernel, a delay in jiffies is usually considered to be a long delay. One possible but not the best way to implement a long delay is to be busy waiting. The implementation of the busy waiting function is "in the manger does not poop" suspicion, it does not use the CPU to do useful work, but also do not allow other programs to use the CPU. The following code consumes 1 seconds of CPU:

unsigned long timeout = jiffies + HZ;

while (Time_before (jiffies, timeout)) continue;

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.