Introduction:
Some time ago, I had Transplanted the uClinux-2.0.x and uClinux-2.4.x kernel, my porting is basically starting from scratch, Linux does not support the code of the target machine, so this
Migration is basically a new support for a target machine.
During my work, I have learned a lot about compilation, debugging, assembly, and link technologies in addition to the operating system. I will introduce them here, maybe more.
Is the connector, because it is more closely related to the operating system.
I hope to share my experience with you. At the same time, if you have any mistakes or mistakes, you are welcome to point out that making progress together is the motivation for me to write these original posts.
"Programming is not a zero-sum game. Teaching programmers what they know is not what they know. I am happy to share what I know with others because I love programming ."
-- John Carmack
Execution of user programs in uClinux
The reason we talk about it from user programs is because we usually have the most contact with applications. From the application to the operating system, I think it is quite natural. The following is a simple example of how a program runs in the operating system.
Suppose there is a C program:
Int main (INT argc, char ** argv [])
{
Printf ("Hello world! /N ");
Return 0;
}
This is the simplest program. Generally, a C language program is executed from main. So, is the main function different from other functions and has a special position?
No. The main function has the same status as other functions. In fact, we can make a C program run from anywhere. For example, in Linux, it does not have the main function. As we all know, after the system executes a starting assembly, it will jump to start_kernel in init/Main. C to start execution.
So why do all user programs need to be executed from the main function? This is why the user C library is used. Generally, you can call some library functions during C language development. After compiling it into an OBJ file, link the binary code of library functions to the program during the link process, and finally form a binary executable file. During the link process, the linker inserts some initialization code in front of the user program. In uClinux, it is in crt0.s (I transplanted the uclibc library ). Whatever the form of crt0.s in any platform, there must be a JMP (or call or BR Transfer Instruction) Main (or _ uclibc_main) in the last few lines of code in this file ). This is why all your programs are executed from main. If you change the jump label to any label, such as Foo. And your program contains both main and foo. In this case, the program starts to execute From Foo. Therefore, the main function, like other functions, has no special position.
In uClinux, The argc and argv parameters of the main function are passed. Take the flat format as an example. UClinux supports an executable file format named flat. This file format is relatively simple, basically tiled, so it is called flat. Now it seems that the version of the uClinux-2.4.x kernel has been able to support ELF format file execution. For the sake of simplicity, I still use the flat format as an example. We will not analyze the flat file format for the moment. We will focus on parameter transfer. To develop a user program in uClinux, encode the program first, compile the program, and compile the generated files in the ELF format. Therefore, use the elf2flt tool to convert the ELF File to flat. Assume that this work has been completed.
Run the foo x y file in the shell of uClinux. foo is the program name, and X and Y are parameters. Anyone who has learned C language knows that X and Y are passed to main as parameters, where argc = 3,
Argv [0] = "foo", argv [1] = "X", argv [2] = "Y ". How are these parameters passed in. When you execute a program, the operating system will link the binary code of the library function to the program in the calling process, and finally form a binary executable file. During the link process, the linker inserts some initialization code in front of the user program. In uClinux, it is in crt0.s (I transplanted the uclibc library ). No matter what platform form of crt0.s, there must be a JMP (or call or BR Transfer Instruction) Main (or _ uclibc_main) in the last few lines of code in this file ). This is why all your programs are executed from main. If you change the jump label to any label, such as Foo. And your program contains both main and foo. In this case, the program starts to execute From Foo. Therefore, the main function, like other functions, has no special position.
In uClinux, The argc and argv parameters of the main function are passed. Take the flat format as an example. UClinux supports an executable file format named flat. This file format is relatively simple, basically tiled, so it is called flat. Now it seems that the version of the uClinux-2.4.x kernel has been able to support ELF format file execution. For the sake of simplicity, I still use the flat format as an example. We will not analyze the flat file format for the moment. We will focus on parameter transfer. To develop a user program in uClinux, encode the program first, compile the program, and compile the generated files in the ELF format. Therefore, use the elf2flt tool to convert the ELF File to flat. Assume that this work has been completed.
Run the foo x y file in the shell of uClinux. foo is the program name, and X and Y are parameters. Anyone who has learned C language knows that X and Y are passed to main as parameters, where argc = 3,
Argv [0] = "foo", argv [1] = "X", argv [2] = "Y ". How are these parameters passed in.
When you execute a program, the operating system will call do_execve (char * filename, char ** argv, char ** envp, struct pt_regs * regs ), this operation will open the file according to the file path and load the file into the memory. argv is placed in the command line parameter, and envp is the environment variable parameter.
When a file is loaded, the system calls handler for loading different files according to different file formats. If it is in the flat format, load_flat_binary () is called in FS/binfmt_flat.c. According to the argv passed along the way, envp first calculates the number of parameters argc and envc. Create a parameter table in the create_flat_tables function. The entire function code is as follows:
Static unsigned long create_flat_tables (unsigned long PP, struct linux_binprm * bprm)
{
(1) unsigned long * argv, * envp;
(2) unsigned long * sp;
(3) char * P = (char *) pp;
(4) int argc = bprm-> argc;
(5) int envc = bprm-> envc;
(6) Char dummy;
(7) sp = (unsigned long *)/
(-(Unsigned long) sizeof (char *) & (unsigned long) P );
(8) SP-= envc + 1;
(9) envp = sp;
(10) SP-= argc + 1;
(11) argv = sp;
(12) flat_stack_align (SP );
(13) if (flat_argvp_envp_on_stack ()){
(14) -- sp; put_user (unsigned long) envp, SP );
(15) -- sp; put_user (unsigned long) argv, SP );
(16 )}
(17) put_user (argc, -- SP );
(18) Current-> MM-> arg_start = (unsigned long) P;
(19) While (argc --> 0 ){
(20) put_user (unsigned long) P, argv ++ );
(21) do {
(22) get_user (dummy, P); P ++;
(23)} while (dummy );
(24 )}
(25) put_user (unsigned long) null, argv );
(26) Current-> MM-> arg_end = Current-> MM-> env_start = (unsigned long) P;
(27) While (envc --> 0 ){
(28) put_user (unsigned long) P, envp); envp ++;
(29) do {
(30) get_user (dummy, P); P ++;
(31)} while (dummy );
(32 )}
(33) put_user (unsigned long) null, envp );
(34) Current-> MM-> env_end = (unsigned long) P;
(35) Return (unsigned long) SP;
}
(1)-(6) rows are variable declarations. Argc and envc respectively record the number of previously calculated parameters and the number of environment variable parameters. P = PP is the pointer to the array of parameters and environment variables. SP is the user zone stack of the program to be executed, that is, the starting address of the user space stack when the foo program is executed. (8)-(11) is a stack adjustment. First SP mobile envc + 1 unit, this envc + 1 is used to store a total of envc envp [0]-> envc [envp-1] element address, excess one put 0, indicates that the envp array ends. Then SP sets aside argc + 1 unit space for each single bit of mobile argc + 1, this argc + 1 unit is used to store argc argv [0]-> argv [argc-1] element address, the extra one also put 0, indicating that the argv array ends. After stack adjustment, argv and envp point to their respective positions in the stack. If the initial value of the Start Stack is recorded as init_sp, envp = init_sp-(envc + 1), argv = envp-(argc + 1 ).
(12) It does not matter. (13)-(17) another stack adjustment. (14) The SP moves one more unit, and then places the envp into this address (envp = init_sp-(envc + 1), and then (15) move the SP to another unit and write the argv. (17) The argc is also written into the stack after the stack is moved.
(18)-(35) is to write argv [0]-> argv [argc-1] (where P points) into the stack area in turn specified by argv. then envp [0]-> edummy, P); P ++;
(31)} while (dummy );
(32 )}
(33) put_user (unsigned long) null, envp );
(34) Current-> MM-> env_end = (unsigned long) P;
(35) Return (unsigned long) SP;
}
(1)-(6) rows are variable declarations. Argc and envc respectively record the number of previously calculated parameters and the number of environment variable parameters. P = PP is the pointer to the array of parameters and environment variables. SP is the user zone stack of the program to be executed, that is, the starting address of the user space stack when the foo program is executed. (8)-(11) is a stack adjustment. First SP mobile envc + 1 unit, this envc + 1 is used to store a total of envc envp [0]-> envc [envp-1] element address, excess one put 0, indicates that the envp array ends. Then, SP moves argc + 1 units, leaving argc + 1 unit space, this argc + 1 unit is used to store argc argv [0]-> argv [argc-1] element address, the extra one also put 0, indicating that the argv array ends. After stack adjustment, argv and envp point to their respective positions in the stack. If the initial value of the Start Stack is recorded as init_sp, envp = init_sp-(envc + 1) now ),
Argv = envp-(argc + 1 ).
(12) It does not matter. (13)-(17) another stack adjustment. (14) The SP moves one more unit, and then places the envp into this address (envp = init_sp-(envc + 1), and then (15) move the SP to another unit and write the argv. (17) The argc is also written into the stack after the stack is moved.
(18)-(35) is to write argv [0]-> argv [argc-1] (where P points) into the stack area in turn specified by argv. then, envp [0]-> envp [envc-1] (also referred to by P) is written into the stack area referred to by envp. at the same time, you must set the data structure of the Process Control Block, such as arg_start, env_start, and env_end.
The following is an example to illustrate the process. for example, if Foo x y is executed, argc = 3, argv [0] = "foo", argv [1] = "X", argv [2] = "Y ", envc = 1, envp [0] = "Path =/bin ". assume that the user stack starts
The space stack address is sp = 0x1f0000, pp = 0x1c0000. After processing, before foo is executed, its user space stack frame is as follows:
--------------------------------
0x1f0000 | 0000 |
--------------------------------
0x1efffc | envp [0] = 0x1c0008 | ----> point to "Path =/bin"
--------------------------------
0x1efff8. | 0000 |
--------------------------------
0x1efff4 | argv [2] = 0x1c0006 | -----> point to "Y"
--------------------------------
0x1efff0 | argv [1] = 0x1c0004 | -----> point to "X"
--------------------------------
0x1effec | argv [0] = 0x1c0000 | -----> points to "foo"
--------------------------------
0x1effe8 | start ADDR of envp = 0x1efffc |
--------------------------------
To the r2-r6. Of course, if there are more than five, you need to use the stack.
Since main includes parameters, before calling Main, put argc in R2, argv in R3, and envp in R4. As mentioned earlier, SP is the starting address of the user space stack. So when you start executing the foo code, R0 = sp. in the above example, R0 is equal to 0x1effe0. Then the following pseudo Assembly Code allows the parameter to be loaded into the correct register.
Load R2, (R0)/* r2 = argc */
Load R3, (r0, 4)/* R3 = argv */
Load R4, (r0, 8)/* r4 = envp */
Call Main/* jump to the main function */
Call _ exit
The above code is the easiest preprocessing before entering the main function. Of course, the processing methods of files in different formats are different in different systems. Some examples just now are some of the scenarios and solutions I have encountered.
I haven't completed the program example yet, for example, how to process printf later, but it's all sour. Let's talk about the parameter transfer of the main function first. When I first learned the C language, I thought the main is quite mysterious. I knew it once I did the system. In fact, there is no difference between main and other functions :)
After writing for half a day, I feel dizzy and dizzy. There must be some mistakes or problems that are not optimized enough. You are welcome to make a picture. Not clear