Getting Started with Linux Kernel (iv)--kernel assembly language rules __linux

Source: Internet
Author: User
Tags data structures exception handling volatile
Any operating system written in a high-level language, the kernel source code is always a small part of the code is written in assembly language. Readers who read the Unix Sys v source know that there are about 2000 lines of code written in assembly language in their core code of about 30,000 lines, There are less than 20 files with the extension. S and. m, most of which are the underlying programs for interrupts and exception handling, as well as programs related to initialization and common subroutines that are invoked in some core code.

Some of the code in the core code, written in assembly language, is generally considered for the following reasons:

The underlying program in the operating system kernel deals directly with hardware and requires specialized instructions that do not have a corresponding language component in the C language. For example, in the 386 system structure, there are no corresponding C language statements for the input/output instructions such as INB, OUTB, etc. for peripherals. Therefore, these low-level operations need to be written in assembly language. Some of the CPU in the operation of the Register is the same, for example, to set a segment register, but also in assembly language to write.
Some special instructions in the CPU also have no corresponding C language component, such as off interrupt, open interrupt and so on. In addition, in different CPU chips of the same system structure, especially in newly developed chips, some new instructions are often added, such as Pentium, Pentium II and Pentium MMX, all of which have extended new instructions on the original base soil, and the use of these instructions will be in assembly language.
The process, program segment, or function that implements some operations in the kernel is invoked very frequently at run time, so its (time) efficiency becomes important. And in assembly language programming, in the algorithm and data structure of the same conditions, its efficiency is usually higher than the written in high-level language. In such procedures or sections, the use of each assembly instruction is often subject to scrutiny. The entry and return of system calls is a typical example. System call access is a very frequently used process that can be used thousands of times per second, and its time efficiency is important. Moreover, the system call access process also involves the user space and system space between the switch, and for this purpose some instructions in the C language does not have the corresponding language components, so, the system calls to enter and return obviously must use assembly language to write.
In some special occasions, the space efficiency of a program also appears to be very important. The operating system's primer is an example. The system's bootstrapper usually must be able to fit in the first sector on the disk. At this time, even if the size of the program is not more than a byte, so can only be written in assembly language.

In the source code of the Linux kernel, a program or program section written in assembly language has several different forms:

The first is a complete assembly code, such that the code is used. s as the suffix of the filename. In fact, although the "pure" assembly code, modern assembly tools also absorb the advantages of C language preprocessing, but also before the compilation of a trip to the preprocessing, and pre-preprocessing of the file. S is the suffix. This class (. S) files, like C programs, can use #include, #ifdef等等成分, and data structures can be defined in. h files.

The second is embedded in the C Program assembly language fragments. Although the ANSI C language standard does not have the provisions on the Assembly fragment, in fact all of the actual use of C-compilation has been expanded in this respect, and the GNU C-compiled GCC has also made a strong expansion in this respect.

In addition, there are several Intel-formatted assembly language programs in the kernel code that are used for system booting. As we focus on the Linux kernel under the Intel I386 system architecture, we'll just introduce GNU support for i386 assembly language

For new access to the Linux kernel source code readers, even if he is more familiar with I386 assembly language, in the understanding of these two assembly language program or fragments will be difficult, and some may even be deterred. The reasons are: in the kernel "pure" assembly code, GNU uses syntax different from the usual 386 assembly language, and in snippets of C programs, there are additional language components that instruct the assembler how to allocate the registers and how to combine the variables defined in the C program. These components make the assembly-language fragment embedded in the C program actually become a kind of intermediate language between the 386 assembly and C.

Therefore, we first focus on the kernel in these two cases of the use of the 386 assembly language, later in the specific situation involved in the specific assembly language code will also be explained.

1 GNU's 386 assembly language


In the Dos/windows realm, 386 of assembly languages are in the form of an Intel-defined statement (instruction) format, which is used in almost all textbooks or reference books on 386 assembly language programming. However, in the Unix realm, the format is defined by At&t. When At&t UNIX to 80386 processors, the format was defined according to the habits and needs of the UNIX insiders. UNIX was originally developed on the PDP-11 machine and migrated to the VAX and 68000 series processors. The assembly language of these machines differs from that of Intel in style and format. The at&t definition of the 386 assembly language is relatively close to those assembly language. Later, this format was retained in the UnixWare. GNU is primarily active in the Unix domain (although GNU is the abbreviation for "GNU is not Unix"). In order to be as compatible with the previous UNIX versions and tools as possible, various system tools developed by GNU naturally inherit the AT&T 386 assembly language format, not in the Intel format

So how big is the gap between the two assembly languages? It's really the same. But sometimes small differences are also very important, without attention will cause trouble. Specifically, there are some of the following differences:

(1) Most uppercase letters are used in the Intel format, and lowercase characters are used in the at&t format.
(2) In the at&t format, the register name is prefixed with "%", while the Intel format does not have a prefix.
(3) In At&t's 386 assembly language, the order of the source operand and target operand of the instruction is the exact opposite of that of the 386 assembly language in Intel. In the Intel format, the target is in front, the source is in the back, and in the at&t format the source is in front and the target is behind. For example, the contents of the register EAX are fed into EBX, in the Intel format for Move Ebx,eax, and in the at&t format for Move%eax,%EBX, the Intel Designer is thinking "Ebx=eax," and at&t The designer of the format is thinking "%eax a >%ebx".
(4) In the at&t format, the operand size (width) of an inbound instruction is determined by the last letter of the opcode name (that is, the suffix of the opcode). The letters used as the opcode suffix have B (8-bit), W (for 16-bit), and L (32-bit). In the Intel format, it is represented by a "BYTE ptr", "WORD ptr," or "DWORD ptr" in front of the operands that represent the memory unit. For example, the byte in the memory cell referred to in Foo is taken into 8-bit register AL, and the different representations in the two formats are as follows:
MOV AL, BYTE PTR FOO (Intel format)
Movb foo,%a1 (at&t format)
(5) In the at&t format, the direct operand is prefixed with "$", while the Intel format does not have a prefix. Therefore, the "PUSH 4" in the Intel format becomes "PUSHL $" in the at&t format.
(6) In the at&t format, the operands of an absolute transfer or call instruction Jump/call (that is, the destination address of the transfer or invocation) are prefixed with "*" (the reader will probably associate it with a pointer in C), but not in the Intel format.
(7) The opcode name of the remote transfer instruction and subroutine invocation instruction, in the at&t format for "ljmp" and "Lcall gas" in the Intel format, then "JMP FAR" and "Call FAR". When the target of the transfer and invocation is a direct operand, two different representations are as follows:
Call FAR Section:offset (Intel format)
JMP FAR sectiom:offset (Intel format)
Lcall $section, $offset (at&t format)
LJMP $section, $offset (at&t format)
The remote return instruction corresponding to:
RET FAR stack_adjust (Intel format)
Lret $stack _adjust (at&t format)
(8) The general format of indirection, which differs as follows:
Section: [BASE + Index*scale + DISP] (Intel format)
Section:disp (base, index, scale) (at&t format)

Note that the calculation is implied in the at&t format. For example, when the section is omitted, index and scale are also omitted, base is EBP, and disp (displacement) is 4 o'clock, which is indicated as follows:
[ebp-4] (Intel format)
-4 (%EBP) (at&t format)
If there is only one base in parentheses in the at&t format, the comma can be omitted, otherwise it cannot be omitted, so (%EBP) is quite + (%EBP,,), further equivalent to (EBP, 0, 0). For example, when index is EAX, scale is 4 (32 bits), DISP is Foo, and others are omitted, the expression is:
[FOO+EAX*4] (Intel format)
Foo (,%eax,4) (at&t format)
This method of addressing is often used to access a field within a particular element in an array of data structures, the starting address of the base array, the size of each array element, and index scale. If the array element is a data structure, the disp is the displacement of the specific field in the structure.

2 embedded in the C language of the Assembly language


You can use the "ASM" statement functionality provided by GCC when you need to embed a section of assembly language programs in a C language program. Its specific format is as follows:

__ASM__ ("Assembly code Snippet")
__asm__ __volatile__ (Specify operation + "assembly code Snippet")

Because the specific assembly language rules are quite complex, so we only care about the kernel source code related to the main rules, and through a few examples to describe, other rules specifically refer to the relevant CPU manual.

Example 1: There is a line in include/asm-i386/io.h:
#define __SLOW_DOWN_IO __asm__ __volatile__ ("Outb%al, $0x80")
Represents a 8-bit output instruction. b indicates that this is a 8-bit, and 0x80 is a constant, which is called "direct operand", so prefix "$" is added, and register name Al is prefixed with "%".

Example 2: multiple-line assembler can also be inserted in the same ASM statement. In the same file, under different conditions, __slow_down_io has a different definition:
#define __SLOW_DOWN_IO __asm__ __volatile__ ("/njmp 1f/n1:/tjmp 1f/n1:")
This is less intuitive here, where a three-line Assembly statement is inserted, "/n" is a newline character, and "/t" represents the tab character. These rules are the same as the rules for the literal characters of a printf statement:

JMP LF
L:JMP LF
L

Here the target of the transfer instruction is in LF (f = forward) to find the first line labeled L. Accordingly, if it is lb, it means to look back. So this little piece of code is meant to make the CPU empty to do two transfer instructions and consume some time.

Example 3: Let's look at a section of code from INCLUDE/ASM-I386/ATOMIC.H.

static __inline__ void Atomic_add (int i, atomic_t *v)
{
__asm__ __volatile__ (
LOCK "Addl%1,%0"
: "=m" (V->counter)
: "IR" (i), "M" (V->counter);
}

In general, the code to insert assembly language into C code is very complex, because there is a problem of assigning registers to the variables in the C language code. For this purpose, it is necessary to expand the assembly language used, and to increase the guidance of the Assembly tools.
Next, let's introduce the general format of the assembly component in C code and explain it. We will also be prompted later when we come across specific code:
Inserting an assembly-language snippet into the C code can be divided into four parts, separated by the ":" number, which is generally in the form of:
Instruction Department: Output Department: Input Department: Damage Department
Be careful not to confuse these ":" With the program label (1: above).
The first part is the compilation of the statement itself, the format and the assembly program used in the same, but there are differences, the different expenditures will soon be mentioned. This part can be called "instruction department", is necessary, while the other parts are visible in the case and omitted, so the simplest case is the same as the General Assembly statements, as in the previous two examples.
In the instruction department, the number plus prefix%, such as%0,%1, and so on, indicates the number of boilerplate operands that need to use registers. The total number of such operands that can be used depends on the number of common registers in the specific CPU, so that the instruction department uses several different operands, indicating that several variables need to be combined with registers, and that GCC and gas are modified at compile time based on subsequent constraints.
So, how to express the binding condition of the variables? This is the role of the remaining parts. "Output part", which is used to specify the output variable, i.e.number of target operandsHow to combine the constraints. There can be multiple constraints in the output section, separated by commas, if necessary. Each output constraint begins with the "=" number, and then the description of the operand type is represented by a letter, then the constraint on the binding of the variable. For example:
: "=m" (V->counter), where there is only one constraint, "=m" indicates the corresponding target operand (%0 in the instruction department) is a memory unit
V->counter. The register or operand itself, which is combined with the operands described in the output section, retains the content prior to the implementation of the embedded assembly code, which gives GCC the basis for scheduling the use of these registers.
The output section is followed by the "input part". The format of the input constraint is similar to the output constraint, but without the "=" number. In the previous example, there are two constraints in the input section. The first is "IR" (i), which means that%1 in the directive can be a "direct operand" in the register and that the operand comes from the variable name I (in parentheses) in C code. The second constraint is "M" (V->counter), and the meaning is the same as the output constraint.
Looking back, we'll look at the% plus number in the instruction department, which represents the operand number of the instruction, which begins with the first constraint (ordinal 0) from the output part, and counts the number of each constraint once.
In addition, in some special operations, a byte operation on the operand also allows you to explicitly indicate which byte operation, at which point a "B" is inserted between the% and the ordinal number to represent the lowest byte, and a "H" is inserted to denote the secondary low byte.

List of common constraint conditions
M, V, O --to express the internal deposit;
R --represents any register;
Q --Represents the register eax, EBX, ECX, edx one;
I, H --Represents the direct operand;
E, F --Represents a floating-point number;
G denotes "arbitrary";
A, B, C, D --The table expresses the requirement to use registers eax, EBX, ECX and edx;
Sd --representing the requirement to use register ESI and EDI, respectively;
I --representing constants (0 to 31).

Back to the example above, the reader should now be able to understand that the code's role is to add the value of parameter I to the V->counter. The keyword lock in the code indicates that the system's bus is locked when the addl instruction is executed, guaranteeing the "atomicity (atomic)" of the operation.

Example 4: Look at the embedded assembly code again, this time taken from Include/asm-i386/bitops.h

#ifdef CONFIG_SMP
#define LOCK_PREFIX "LOCK;"
#else
#define LOCK_PREFIX ""
#endif

#define ADDR (* (Volatile long *) ADDR)

static __inline__ void Set_bit (int nr, volatile void * addr)
{
__asm__ __volatile__ (Lock_prefix
"BTSL%1,%0"
: "=m" (ADDR)
: "Ir" (NR));
}

The instruction BTSL here sets one of the 32-bit operands to 1. The Parameters nr and addr indicate that the 32-digit NR bit with a memory address of addr is set to 1.

Example 5: One more complex, but very important example, from Include/asn-i386/string.h:

static inline void * __memcpy (void * To, const void * from, size_t N)
{
int D0, d1, D2;
__asm__ __volatile__ (
"Rep; movsl/n/t "
"Testb $2,%b4/n/t"
"Je 1f/n/t"
"Movsw/n"
"1:/ttestb $1,%b4/n/t"
"Je 2f/n/t"
"Movsb/n"
"2:"
: "=&c" (D0), "=&d" (D1), "=&s" (D2)
: "0" (N/4), "Q" (n), "1" ((long) to), "2" ((long) from)
: "Memory");
return (to);
}

The __memcpy function here is the kernel-level implementation of the memcpy function we often call to replicate the contents of the memory space. The parameter to is the destination address of the copy, from is the source address, and the length of n-bit copied content, in bytes. GCC generates the following code:

/table>
Rep MOVSL
      testb $,%b4
   & Nbsp;  je 1f
      MOVSW
1:    testb $,%B4
      je 2f
      MOVSB
2:

    where the output part has three constraints, the function internal variable d0, D1, D2 corresponding operand%0 to%2, where d0 must be placed in the ECX register; D1 must be placed in the EDI register; D2 must be placed in the ESI register. Looking at the input section, there are four more constraints corresponding to the operand% 3,
% 4,%5,%6. where operand%3 and operand%0 use the same register ECX, representing the number of bytes to be copied from the byte number (N/4), and%4 for n itself, which requires any allocation of a register to be stored;%5,%6 is the parameter to and from, respectively,%1 and% 2 Use the same registers (EDI and ESI)
     Look at the command department again. The first instruction is "rep", which is only a label, indicating that the next instruction Movsl to be repeated, and that the contents of the register ECX be reduced by 1 per repeat, until it becomes 0. So, in this code, a total of N/4 times are executed. Movsl is a very important complex instruction in the 386 instruction system, which copies a long word from the point where ESI refers to the place it refers to, and adds 4 to ESI and EDI, respectively. In this way, when the MOVSL instructions in the code are executed and the TESTB instructions are ready to be executed, all the long words are copied, with a maximum of three bytes left. The above three registers are implicitly used in this process, which explains why these operands must specify the registers that must be stored in the input and output parts.
     is followed by processing the remaining bytes (up to three). First pass the TESTB test operand%4, that is, copy the bit2 in the lowest byte of the length n, if this one bit 1 indicates at least two bytes, so copy a short word by movesw (ESI and EDI respectively plus 2), or skip it. And then through the TESTB test operand%4 bit1, if this bit is 1, it means that there is one byte left, so the command movsb to copy another byte, otherwise skip. When the label 2 is reached, execution is over.     

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.