Linux Process understanding and practice (1) Basic concepts and programming Overview (fork, vfork, cow)

Source: Internet
Author: User

Linux Process understanding and practice (1) Basic concepts and programming Overview (fork, vfork, cow)
Process and program what is a program? A program is a set of commands to complete a specific task. What is a process? [1] from the user's perspective: A process is an execution process of a program [2] from the core of the operating system: A process is the basic unit of resources such as memory and CPU time slice allocated by the operating system. [3] process is the smallest unit of resource allocation [4] each process has its own independent address space and execution status. [5] a multitasking operating system like UNIX allows many programs to run at the same time. Each running program forms a process. The data structure process consists of three parts: PCB, program segment, and data segment. Process control block PCB: used to describe the process and all information required to control the process operation. Code segment: the code segment of the program that can be executed by the Process scheduler on the CPU. Data Segment: The data segment of a process, which can be the raw data processed by the program corresponding to the process, it can also be the difference between the intermediate or final data process and the program generated after the program is executed. The process is dynamic (the only sign of the Process existence: PCB, the CPU uses the PCB to control the process ), the life cycle of a static process is relatively short, and the program is permanent. One process can correspond to only one program, and one program can correspond to multiple processes. The three-state process of the process is ready for creation and executed for scheduling. The process is re-ready because the time slice is used up. The process is blocked due to I/O requests. Note the following when the I/O process is complete: cannot be executed directly after blocking, and must enter the ready state. Running State: the process occupies the CPU and runs on the CPU. ready state: the process has run conditions, but the CPU has not been allocated. Blocking state: A process cannot run temporarily because it waits for something to happen. A process is in one of the three States in its life. The three basic states of the process are known, but in the specific implementation of the operating system, the designer can design different States according to the actual situation, so there are the following States: TASK_RUNNING: TASK_INTERRUPTIBLE: TASK_UNINTERRUPTIBLE: TASK_STOPPED: TASK_UNINTERRUPTIBLE) process scheduling task save Processor field information select a process according to a certain algorithm to assign the processor to the Process programming terminology process flag: each process will be assigned a unique number, we call it process identifier, or directly call it PID. is a positive integer ranging from 2 to 32768. When a process is started, it will select the next unused number in sequence as its own PID 1 process is a special process init 0 Process idle process about 0, 1 explanation: Process 0: the first process created in Linux boot. After the system is loaded, it becomes a process scheduling, switching, and storage management process. Process 1: init process, which is created by the 0 process, complete system initialization. is the ancestor process of all other user processes in the system. the Linux kernel manages processes through a task_struct struct called process descriptors. This struct contains all the information required by a process. The names and formats of Process Creation primitives provided by different operating systems for process creation are different. However, after the Process Creation primitives are executed, the work done by the operating system is roughly the same, including the following: (1) assign an internal identifier to the newly created process and create a process structure in the kernel. (2) copying the environment of the parent process (3) allocating resources to the process, including all elements required by the process image (programs, Data, user stacks, etc.), (4) copy the content of the parent process address space to the process address space. (5) set the process status to ready and insert the ready queue. When a process is terminated, the operating system does the following: (1) Disable Soft Interrupt: the process is about to terminate and no soft interrupt signal is processed; (2) Recycle resources: release all resources allocated by the process, such as closing all opened files and releasing the data structure of the process. (3) Write accounting information: record the accounting data generated during the process (including various statistics during the process) to a global accounting file. (4) set the process to stiff: send a soft interrupt signal to the parent process to send the termination information status to the specified storage unit. (5) Transfer Process Scheduling: Because the CPU has been released, the CPU needs to be distributed by the process scheduling. The fork system calls to copy a process image. The child process obtained by using the fork function inherits the address space of the entire process from the parent process, including: process context, process stack, memory information, open file descriptor, signal control settings, process priority, process group number, current working directory, root directory, resource limit, control terminal, etc. Differences between a child process and a parent process: 1. The lock set by the parent process. The child process does not inherit 2. The respective process ID: different parent and child process IDs 3. Pending warnings of sub-processes are cleared; 4. Pending signal sets of sub-processes are set to empty sets; fork system calls [cpp] view plaincopy to view the CODE piece derived from my CODE piece # include <unistd. h> pid_t fork (void); create a sub-process return value: If a sub-process is created successfully, return the sub-process ID for the parent process. If a sub-process is created successfully, for a sub-process, the returned value is 0. If the value is-1, the creation fails. How can we understand that the fork function is called once and returns a second request? The essence of the problem is that two responses are returned in their respective process spaces. Sub-processes and parent processes have their own memory space (fork: copy of code segments, data segments, stack segments, and PCB process control blocks ). Changing the count of sub-processes does not affect the parent process because they have their own data segments.

// Example: the relationship between data in the parent and child processes (in fact, it basically does not matter) int main (int argc, char * argv []) {signal (SIGCHLD, SIG_IGN ); int count = 10; pid_t pid = fork (); if (pid =-1) err_exit ("fork error"); else if (pid = 0) // sub-process {+ count; cout <"In child: pid =" <getpid () <", ppid =" <getppid () <endl; cout <"count =" <count <endl;} else if (pid> 0) // parent process {++ count; cout <"In parent: pid = "<getpid () <", child pid = "<pid <endl; cout <" count = "<count <endl ;} exit (0 );}

 

 
// For more information, see the following figure: Why does Hello World print eight int main (int argc, char * argv []) {signal (SIGCHLD, SIG_IGN); fork (); // each fork creates a sub-process, copies the parent process, and continues to run down, so it is like a binary tree with four layers, so a total of 8 hello wold fork (); cout <"Hello World" <endl; exit (0 );}

 

// Example: Generate N sub-processes: int main (int argc, char * argv []) {signal (SIGCHLD, SIG_IGN); int processCount; cin> processCount; for (int I = 0; I <processCount; ++ I) {pid_t pid = fork (); if (pid <0) err_exit ("fork error "); else if (pid = 0) {cout <"Child... "<endl; exit (0) ;}} exit (0 );}

 

COW: in Linux, fork () will generate a sub-process identical to the parent process, but the sub-process will call the exec system later. For efficiency consideration, in Linux, the "Copy at write time" technology is introduced, that is, only when the content of each segment of the process space changes will the content of the parent process be copied to the child process. So there is no code in the physical space of the sub-process. How can I get the command to execute the exec system call? Before the exec process after fork, the two processes use the same physical space (memory zone). The Code segment, data segment, and stack of the child process all point to the physical space of the parent process. That is to say, the two have different virtual spaces, but their physical spaces are the same. When the Parent and Child processes change the corresponding segments, allocate physical space for the corresponding segments of the Child processes. If it is not because of exec, the kernel allocates physical space to the data and stack segments of the sub-process (the two have their own process space, which does not affect each other ), the code segment continues to share the physical space of the parent process (the Code of the two is the same ). If it is because of exec, because the code executed by the two processes is different, the child process code segment will also be allocated a separate physical space. COW details: Now there is a parent process P1, which is a subject, so it has a soul and a body. Now, in its virtual address space (with corresponding Data Structure Representation), there are four parts: Body segment, data segment, heap, and stack, the kernel allocates physical blocks for these four parts. That is, text block, data block, heap block, and stack block. 1. now P1 uses the fork () function to create a sub-process P2 for the process. kernel: (1) copy the P1 body segment, data segment, heap, and stack, note that the content is the same. (2) Allocate physical blocks for these four parts. Physical blocks of P2: body segments> P1 body segments are actually not allocated to P2, let the P2 Text Segment point to the P1 text block, data segment-> P2's own data segment block (assign the corresponding block for it), heap-> P2's own heap block, stack-> P2 stack block. As shown in: the arrow from left to right indicates copying content. 2. write-time replication technology: the kernel only creates virtual space structures for newly generated sub-processes. They are copied to the virtual space structure of the parent process, but do not allocate physical memory to these segments, they share the physical space of the parent process. When the Parent and Child processes change the corresponding segments, they allocate physical space for the corresponding segments of the child process. 3. vfork (): this practice is even more popular. The virtual address space structure of the kernel connection sub-process is not created, and the virtual space of the parent process is directly shared. Of course, in this way, the physical space of the parent process is shared: A process is a subject, so it has a soul and a body. The system must create a corresponding entity for its implementation, soul and physical entities. Both have corresponding data structures in the system, and physical entities represent their physical meanings. The traditional fork () system calls directly copy all the resources to the newly created process. This implementation is too simple and inefficient because the data it copies may not be shared. Worse, if a new process intends to execute a new image immediately, all copies are discarded. In Linux, fork () uses the copy-on-write page. Copy at write time is a technology that can delay or even avoid copying data. At this time, the kernel does not copy the whole process address space, but shares the same copy with the child process. Data is copied only when data needs to be written, so that each process has its own copy. That is to say, resource replication is only performed when the data needs to be written. Before that, the data is only shared in read-only mode. This technology delays the copy of pages in the address space until the actual writing occurs. When the page is not written at all {For example: fork () and then immediately call exec ()} they do not need to be copied. The actual overhead of fork () is to copy the page table of the parent process and create a unique process descriptor for the child process. Generally, an executable file is run immediately after a process is created, this optimization avoids copying a large amount of data that is not used at all (the address space usually contains dozens of megabytes of data ). Since Unix emphasizes the fast execution of processes, this optimization is very important. Here, we can add that Linux COW and exec are not necessarily related. String str1 = "hello world"; string str2 = str1; then run the code: str1 [1] = 'q'; str2 [1] = 'W '; after the first two statements, the addresses of str1 and str2 to store data are the same. After the modified content, the address of str1 has changed, while the address of str2 is still the original one, this is the application of COW Technology in C ++;

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.