For memory replication and sharing in the fork function, the fork Function Memory Sharing

Source: Internet
Author: User

For memory replication and sharing in the fork function, the fork Function Memory Sharing

When I first started multi-process programming in linux, I felt very strange about the following code,

 1 #include<unistd.h> 2 #include<stdio.h> 3 #include<string.h> 4 #include<stdlib.h> 5 #include<stdarg.h> 6 #include<errno.h> 7 #define LEN 2 8 void err_exit(char *fmt,...); 9 int main(int argc,char *argv[])10 {11     pid_t pid;12     int loop; 13 14     for(loop=0;loop<LEN;loop++)15     {16     if((pid=fork()) < 0)17         err_exit("[fork:%d]: ",loop);18     else if(pid == 0)19     {20        printf("Child process\n"); 21     }22     else23     {24         sleep(5);25     }26     }27 28     return 0;29 }

Why does this program create three sub-processes instead of two? Why is a return 0 added to the end of row 20th, and two sub-processes created? I had never understood it. Later I learned about the storage space layout of the C language program and why the Parent and Child processes shared the text section (code segment CS) after fork! What is the specific principle, and let me know it slowly!

 

First, you must understand the storage space layout of the C program, as shown in:

(The Source image is from section 7.6 of advanced programming for UNIX environments)

After a C program is executed, it is loaded into the memory, and its layout in the memory is as follows, environment variables, command line parameters, stacks, stacks, data segments (initialized and uninitialized), and body segments. The following describes what these segments represent:

Environment Variables and command line parameters: these are the environment variables (such as $ PATH) on the Unix System and the parameters passed to the main function (the content pointed to by the argv pointer ).

Data Segment: this refers to the global variables defined in the C program. If they are not initialized, they are stored in the uninitialized data segment. During the program running, exec is assigned a value of 0. Otherwise, it will be stored in the initialized data segment. When the program is running, exec will read the data from the program file. (Those who know assembly must know DS, the data segment in assembly language. This is actually a thing with the data segment in assembly ).

Heap: This part is mainly used to dynamically allocate space. For example, the space applied for using malloc in the C language is applied for in this region.

Body section: C language code is not directly executed, but compiled into machine commands for execution on the computer, the final machine commands are stored in this area (the CS code segment in the Assembly refers to this area ).

STACK: I personally think this is the most critical part of the C program memory layout. This part is mainly used for function calling. Specifically, the program has only the content of the main function in the stack at the beginning (that is, the main stack frame). If the main function needs to call the func function, then, the return address of the func function (the address of the main function), the func function parameters, the local variables defined in the func function, and the return value of the func function will be pushed to the stack, in this case, the content of the func function (the stack frame of func) is added to the stack ). Then, after the func function is run, the stack will be played and the original compressed content will be removed (I .e., the func stack frame will be cleared). At this time, only the main stack frame is left in the stack. (This area is the stack segment SS in the Assembly)

OK. This is the memory layout of the C program. Here I think of another point, that is, global variables and static variables are stored in the data segment, while local variables are stored in the stack, the data in the stack is gone after the function call, which is why the global variable life cycle is longer than the local variable life cycle.

 

After learning about the memory layout of the C program, let's take a look at the memory replication mechanism of fork. For this, we only need to know one sentence, "The child process replicates the data space (Data Segment), stack, and heap of the parent process, and the parent and child process share the body segment." That is to say, for the data in the program, the child process needs to copy, but for the command, the child process does not copy but shares with the parent process. Let's take a look at the following code (this is a little bit I added to the above Code ):

1/* This program creates three sub-processes. To understand this sentence, the Parent and Child processes copy data segments, stacks, and stacks, and share the Text Segment 2*3 */4 # include <unistd. h> 5 # include <stdio. h> 6 # include <string. h> 7 # include <stdlib. h> 8 # include <stdarg. h> 9 # include <errno. h> 10 # define BUFSIZE 51211 # define LEN 212 void err_exit (char * fmt ,...); 13 int main (int argc, char * argv []) 14 {15 pid_t pid; 16 int loop; 17 18 for (loop = 0; loop <LEN; loop ++) 19 {20 printf ("Now is No. % d loop: \ n ", loop); 21 22 if (pid = fork () <0) 23 err_exit (" [fork: % d]: ", loop); 24 else if (pid = 0) 25 {26 printf ("[Child process] P: % d C: % d \ n", getpid (), getppid (); 27} 28 else29 {30 sleep (5); 31} 32} 33 34 return 0; 35}

Why does the above Code create three sub-processes? Let's take a detailed analysis of its execution process:

First, the parent process executes the loop, creates a sub-process through fork, and then sleep5 seconds.

Let's take a look at the child process created by the parent process. Here we record it as child process 1. child process 1 completely copies the data part of the parent process, but note that its body segment is shared with the parent process. That is to say, the part where sub-process 1 starts to execute the code is not from the main {, but where the main function is executed, and then it is executed, specifically, it will execute the code after fork. Therefore, child process 1 first prints its ID and its parent process ID. Then continue the second loop, and then this sub-process 1 creates a sub-process. We record it as sub-process 11, and sub-process 1 begins to sleep.

Child process 11 then the code executed by child process 1 starts to be executed (after fork). It also prints its ID and parent process ID (child process 1 ), then, the value of loop plus 1 is equal to 2, so the sub-process 2 is returned directly.

After the sub-process 1 sleep is complete, the value of loop is added with 1 and then changed to 2. Therefore, sub-process 1 also returns!

Then let's go back and see the parent process. It only loops once. After sleep is complete, we start the second loop. This time we create another sub-process, which is marked as sub-process 2. Then the parent process starts sleep, and the sleep is complete.

So what about the sub-process 2? It starts execution after fork. At this time, the loop is equal to 1. After printing its ID and parent process ID, the loop ends and the whole child process 2 ends!

This is the running process of the above Code, as shown in the relationship between processes:

    

The loop = % d in is the value of the loop when the process starts execution. The running result of the above Code is as follows:

    

Here, this 3498 process is our main process, and 3499 is the sub-process 11, is the sub-process 2.

 

Finally, let's answer the question we raised at the beginning. Why does the "if (pid = 0)" subprocess add return 0, two sub-processes will be created, because the sub-process 1 runs here and ends directly. The second loop is no longer performed, so the sub-process 11 will not be created again, so we finally created two sub-processes!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.