Linux OS system analysis (2)-Process Creation and executable program loading

Last Update:2018-12-04 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Student ID: SA ××× 310 name: ×× Tao

Environment: ubuntu13.04 gcc4.7.3

1. Process Management

Processes in Linux are mainly managed by kernel. System calling is a way for applications to interact with the kernel. As an interface, system calls allow applications to enter the operating system kernel to use various resources provided by the kernel, such as operating hardware, switching and interruption, and changing the privileged mode.

Common system calls: exit, fork, read, write, open, close, waitpid, execve, lseek, getpid...

User and Kernel

To make the operating system provide a good process abstraction, restrict the commands that a program can execute and accessible address space.

A processor usually uses a mode bit in a control register to provide this function, which describes the current privileges of the process. When the mode bit is set, the process runs in the kernel state and can execute any commands in the instruction set and access any storage location in the system.

If the mode bit is not set, the process runs in the user State and cannot execute privileged commands or directly reference the code and data in the kernel area of the address space. Any such attempt will cause a fatal protection fault. On the contrary, the user program must indirectly access the kernel code and data through the system call interface.

For more information about fork analysis, see this blog.

Waitpid

First, let's take a look at the zombie process. When a process is terminated for some reason, the kernel does not immediately clear it from the system. On the contrary, a process is saved in a terminated State until its husband process is recycled. When the parent process recycles the terminated child process, the kernel passes the exit status of the child process to the parent process, and then discards the terminated process. At this time, the process does not exist. A stopped but not recycled process is called a zombie process.

If the parent process stops without reclaiming the child process, the child process becomes a zombie process and does not run immediately, but still consumes system memory resources.

A process can call the waitpid function to wait for its sub-process to terminate or stop.

The function prototype is as follows:

Pid_t waitpid (pid_t PID, int * status, int options)

If it succeeds, it is the PID of the sub-process. If wnohang is successful, it is 0. If it is other errors, it is-1.

Let's look at an example of the waitpid function.

#include"csapp.h"#include<errno.h>#define N 5int main(){    int status, i;    pid_t pid;        for(i=0; i<N; i++)    {        if((pid = Fork())==0)            exit(100+i);    }    while((pid = waitpid(-1, &status, 0))>0)    {        if(WIFEXITED(status))            printf("Child %d exited normally with status=%d!\n",pid,WIFEXITED(status));        else            printf("Child %d terminated abnormally!\n",pid);    }    if(errno != ECHILD)      unix_error("waitpid error\n");    return 1;}

Running result

The first parameter of waitpid is-1, and the waiting set is composed of all sub-processes of the parent process. If the value is greater than 0, it is the PID of the waiting process.

If the third parameter of waitpid is-1, waitpid suspends the execution of the calling process until it waits for a child process in the set to terminate. If a process in the collection is terminated, waitpid returns immediately.

The running result is that the waitpid function does not recycle dead sub-processes in a specific order.

Mention the wait function, which is a simple version of The waitpid function. The prototype is as follows:

Pid_t wait (int * Status)

Equivalent to waitpid (-1, & status, 0)

Execve

In Linux, you must use the exec function family to start another program in one process. The system calls execve () to replace the current process with a specified program. Its parameters include the file name (filename), the parameter list (argv), and the environment variable (envp ). Of course, there are more than one exec function family, but they are roughly the same. In Linux, they are: execl, execlp, execle, execv, execve and execvp. Below I only take execve as an example, what is the difference between other functions and execlp? Use the man EXEC command to learn about them.

Once a process calls the exec function, it is "dead". The system replaces the code segment with the code of the new program and discards the original data segment and stack segment, and allocate new data segments and stack segments for the new program. The only difference is the process number. That is to say, for the system, it is the same process, but it is already another program. (However, Some exec functions can inherit information such as environment variables .)

The prototype is as follows:

Int execve (const char * filename, const char * argv [], const char * envp []);

If the call is successful, no result is returned. If an error occurs,-1 is returned.

Execve function loads and runs the executable target file filename, with the parameter list argv and Environment Variable list envp. execve returns to the calling program only when an error occurs, for example, filename cannot be found. Therefore, if you call fork twice at a time, execve calls once and never returns.

Argv is organized in the memory as follows:

Argv [0] is the name of the executable target file.

The envp is organized in the memory as follows:

The list of environment variables is represented by a data structure similar to the pointer array. The envp variable points to a pointer array ending with null. Each Pointer Points to an environment variable string, each string is a key-value pair, such as "name = value.

You can use the following command to print command line parameters and environment variables:

#include"csapp.h"int main(int argc, char *argv[], char *envp[]){int i;    printf("Command line arguments:\n");    for(i=0; argv[i]!=NULL; i++)        printf("argv[%2d]: %s\n", i, argv[i]);    printf("\n");    printf("Environment variables:\n");    for(i=0; envp[i]!=NULL; i++)        printf("envp[%2d]: %s\n", i, envp[i]);    exit(0);}

2. A simple shell is implemented in combination with fork, wait, and exec. First, build a shell framework to read a command line from the user, evaluate and parse the command line.

#include<stdio.h>#include"csapp.h"#define MAXARGS 128void eval(char *cmdline);int parseline(char *buf,char **argv);int builtin_command(char **argv);int main(){char cmdline[MAXLINE];while(1){printf("> ");Fgets(cmdline,MAXLINE,stdin);if(feof(stdin)) exit(0);eval(cmdline);}//printf("Hello\n");return 1;}int builtin_command(char **argv){    if(!strcmp(argv[0],"quit")) exit(0);    if(!strcmp(argv[0],"&")) return 1;    if(!strcmp(argv[0],"-help"))    {    printf("-help    help infomation.\n");    printf("ls       list files and folders of current path.\n");    printf("pwd      show current path.\n");    return 1;    }    if(!strcmp(argv[0],"pwd"))    {    printf("%s\n",getcwd(NULL,0));    return 1;}    return 0;}void eval(char *cmdline){char *argv[MAXARGS];char buf[MAXLINE];int bg;pid_t pid;strcpy(buf, cmdline);bg = parseline(buf, argv);if(argv[0] ==NULL) return;if(!builtin_command(argv)){if((pid = Fork()) == 0){if(execve(argv[0],argv,environ) < 0){printf("%s:Command not found.\n",argv[0]);exit(0);}}if(!bg){int status;if(waitpid(pid,&status,0)<0)unix_error("waitfg:waitpid error");}else printf("%d %s",pid, cmdline);}return;}int parseline(char *buf, char **argv){char *delim;int argc;int bg;buf[strlen(buf)-1]=' ';while(*buf && (*buf==' ')) buf++;argc = 0;while((delim = strchr(buf,' '))){argv[argc++] = buf;*delim = '\0';buf = delim + 1;while(*buf && (*buf==' ')) buf++;}argv[argc] = NULL;if(argc == 0) return 1;bg = (*argv[argc-1] == '&');if(bg !=0) argv[--argc] = NULL;return bg; }

Explain the code. Several major functions: Eval: Explain the commands received. Parseline: parse the command line parameters separated by spaces, construct argv to pass to execve, and execute the corresponding program. Builtin_command: Check whether the parameter is a built-in shell command. If yes, explain the command immediately and return 1; otherwise, return 0. below we use some system call to implement several common Linux commands. LsDisplays the files and folders in the current path. C code implementation:

#include<stdio.h>#include<time.h>#include<sys/types.h>#include<dirent.h>#include<sys/stat.h>#include<stdlib.h>#include<string.h>#include<pwd.h> #include<grp.h>void do_ls(char[]);void dostat(char *);void show_file_info(char *,struct stat *);void mode_to_letters(int,char[]);char * uid_to_name(uid_t);char * gid_to_name(gid_t);void main(int argc,char *argv[]){    if(argc==1)        do_ls(".");    else        printf("Error input\n");}void do_ls(char dirname[]){    DIR *dir_ptr;   //Path    struct dirent *direntp;     //Struct to save next file node    if((dir_ptr=opendir(dirname))==0)        fprintf(stderr,"ls:cannot open %s\n",dirname);    else{        while((direntp=readdir(dir_ptr))!=0)            dostat(direntp->d_name);        closedir(dir_ptr);    }}void dostat(char *filename){    struct stat info;    if(lstat(filename,&info)==-1)        perror("lstat");    else        show_file_info(filename,&info);}void show_file_info(char *filename,struct stat *info_p){    char modestr[11];    mode_to_letters(info_p->st_mode,modestr);    printf("%-12s",modestr);    printf("%-4d",(int)info_p->st_nlink);    printf("%-8s",uid_to_name(info_p->st_uid));    printf("%-8s",gid_to_name(info_p->st_gid));    printf("%-8ld",(long)info_p->st_size);    time_t timelong=info_p->st_mtime;    struct tm *htime=localtime(&timelong);    printf("%-4d-%02d-%02d %02d:%02d",htime->tm_year+1990,htime->tm_mon+1,htime->tm_mday,htime->tm_hour,htime->tm_min);    printf(" %s\n",filename);}//cope with permissionvoid mode_to_letters(int mode,char str[]){    strcpy(str,"----------");    if(S_ISDIR(mode))   str[0]='d';    if(S_ISCHR(mode))   str[0]='c';    if(S_ISBLK(mode))   str[0]='b';    if(mode & S_IRUSR)  str[1]='r';    if(mode & S_IWUSR)  str[2]='w';    if(mode & S_IXUSR)  str[3]='x';    if(mode & S_IRGRP)  str[4]='r';    if(mode & S_IWGRP)  str[5]='w';    if(mode & S_IXGRP)  str[6]='x';    if(mode & S_IROTH)  str[7]='r';    if(mode & S_IWOTH)  str[8]='w';    if(mode & S_IXOTH)  str[9]='x';}//transfor uid to usernamechar * uid_to_name(uid_t uid){    struct passwd *pw_str;    static char numstr[10];    if((pw_str=getpwuid(uid))==NULL){        sprintf(numstr,"%d",uid);               return numstr;    }    else        return pw_str->pw_name;}//transfor gid to usernamechar * gid_to_name(gid_t gid){    struct group *grp_ptr;    static char numstr[10];    if((grp_ptr=getgrgid(gid))==NULL){        sprintf(numstr,"%d",gid);        return numstr;    }    else        return grp_ptr->gr_name;}

Implementation idea: mainly the do_ls function, open the folder through the opendir command, and then use readdir to read files or folders in the folder and output information. Compile the LS program through the shell call just now. The effect is as follows: 3. The signal soft interrupt signal (signal, also called a signal) is used to notify the process of an asynchronous event. Processes can call kill to send Soft Interrupt signals to each other. The kernel can also send a signal to the process due to internal events to notify the process of an event. Note that the signal is only used to notify a process of events and does not transmit any data to the process.

Processes that receive signals have different processing methods for various signals. There are three types of processing methods: the first is an interrupt-like processing program. For signals to be processed, the process can specify a processing function, which is used for processing. The second method is to ignore a signal and do not process it any more, just as it has never happened. The third method is to retain the default value of the system for processing the signal. The default operation for most mail numbers is to terminate the process. A process calls signal to specify the process's processing behavior for a signal. For example, a process can force terminate a process by sending a sigkill signal to another process. When a child process is terminated or stopped, the kernel sends a sigchld to the parent process. There are many types of signals, each of which corresponds to a system event. The signal processing process is as follows: the prototype of the signal receiving processing function is defined as follows:

#include <signal.h>typedef void (*sighandler_t)(int);sighandler_t signal(int signum, sighandler_t handler);Returns: ptr to previous handler if OK, SIG_ERR on error (does not set errno)

Let's look at an example of receiving signals:

#include "csapp.h"/* SIGINT handler */void handler(int sig){return; /* Catch the signal and return */}unsigned int snooze(unsigned int secs) {unsigned int rc = sleep(secs);printf("Slept for %u of %u secs.\n", secs - rc, secs);return rc;}int main(int argc, char **argv) {if (argc != 2) {fprintf(stderr, "usage: %s <secs>\n", argv[0]);exit(0);}if (signal(SIGINT, handler) == SIG_ERR) /* Install SIGINT handler */unix_error("signal error\n");(void)snooze(atoi(argv[1]));exit(0);}

Program parsing: The program accepts an int parameter, which is used to set the number of seconds for sleep. Normally, the program automatically exits after the corresponding number of seconds for sleep. Because sigini is registered, when you press Ctrl + C on the keyboard, jump to the handler function to process the signal. 4. There are two types of Dynamic Links and static Link Libraries: Dynamic and Static. Dynamic files are usually suffixed with. So, and static files are suffixed with.. Example: libhello. So libhello.
To use different versions of libraries in the same system, you can add the version number as the suffix after the library file name, for example, libhello. so.1.0. Because the program connection uses. So as the file suffix by default. To use these libraries, we usually use the symbolic connection method.
Ln-s libhello. so.1.0 libhello. so.1
Ln-s libhello. so.1 libhello. So
Usage Library
When you want to use a static library, the connector will find the functions required by the program and then copy them to the execution file. Because this copy is complete, once the connection is successful, static libraries are no longer needed. However, this is not the case for dynamic libraries. The dynamic library will leave a flag in the execution program, indicating that when the program is executed, the library must be loaded first. Because the dynamic library saves space, the default operation for Linux connection is to connect to the dynamic library first. That is to say, if both static and dynamic libraries exist, will be connected to the dynamic library.

5. the ELF file format is associated with the process address space. The typical distribution of storage areas in the process address space is as follows:

From low address to high address: code segment, (initialization) Data Segment, (uninitialized) Data Segment (BSS), heap, stack, command line parameters, and Environment Variables
Heap growth to high memory address
Stack to low memory address growth for elf files, there are generally the following sections. text section: mainly compiled source code instructions, is read-only field.
. Data Section: Non-const global variables and local static variables after initialization.
. BSS: Non-const global variables and local static variables after initialization.
. Rodata: It stores read-only data about elf files, but it will not be repeated here. In the ELF File, use the section and program structures to describe the content of the file. Generally, the elf relocated file adopts section, the elf Executable File uses program, and the reconnectable file uses both.
Loading a file is actually a very simple process. You can use the type attribute in section or program to determine whether to load the file. Then, you can use the offset attribute to locate the data in the file and read (copy) it) to the corresponding memory location. This location can be determined by the vaddr attribute in the program. For section, you can define the loading location by yourself. The essence of dynamic connection is to relocate elf files and parse symbols.
Relocation allows the ELF file to be executed at will (a fixed execution address is given when a common program is linked); symbol parsing, this allows the ELF File to reference dynamic data (data that does not exist at the time of link ).
In terms of the process, we only need to relocate. Symbol Parsing is a branch of the relocation process. 6. reference the programmer's self-cultivation-links, loads and libraries
Computer systems: a programmer's perspective 3rd edithlinux kernel programming understanding the kernel 3rd Edith

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Linux OS system analysis (2)-Process Creation and executable program loading

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support