Talking about the use _c language of C language fork () function in concurrent programming under Linux environment

Last Update:2017-01-18 Source: Internet

Author: User

Tags sleep

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

A new process created by Fork is called a subprocess (child process). The fork function was invoked once, but returned two times. The return value of the child process is 0, and the return value of the parent process is the process ID of the new process. The reason for returning the child process ID to the parent process is that because a process can have more than one child process, and there is not a single function that allows a process to obtain the process ID of all its child processes. Fork the reason that a subprocess returns a value of 0 is that a process will have only one parent process, so the child process can always call Getpid to get the process ID of its parent process.
The two main reasons for fork failure are that there are already too many processes in the system, or the total number of processes for the actual user ID exceeds the system limit.

Fork has the following two uses:

(1) A parent process wants to replicate itself so that the parent and child processes execute different pieces of code at the same time. This is common in the Network service process-the parent process waits for the client's service request. When this request arrives, the parent process invokes fork, which causes the child process to process the request. The parent process continues to wait for the next service request to arrive.

(2) A process is to perform a different program. This is a common case for the shell. Call exec immediately after the child process returns from fork.

It boils down to that is to achieve multithreading. C-language multithreading implementation needs its own control to implement, this is more complex than Java.

Note: Fork does create a child process and completely replicates the parent process, but the subprocess starts with the instruction that follows the fork. And it's logical for the reason, if the child process also executes all instructions from the beginning to the end of main, it must also create a child process when it executes to the fork instruction, and so on, this little program can create countless processes that can paralyze your computer, So the fork author would certainly not do that.

I was just beginning to do a lot of process programming under Linux, for the following code is very strange,

#include <unistd.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <stdarg.h>
#include <errno.h>
#define LEN 2
void Err_exit (char *fmt,...);
int main (int argc,char *argv[])
{
  pid_t pid;
  int loop;

  for (loop=0;loop<len;loop++)
  {
  if (pid=fork ()) < 0)
    err_exit ("[fork:%d]:", loop);
  else if (PID = = 0)
  {
   printf ("Child process\n"); 
  }
  else
  {sleep
    (5);
  }
  }

  return 0;
}

Why does this program create 3 subprocess instead of two, and then add a return 0 after line 20th, and then two child processes? Originally have been confused, then understand the C language Program storage space layout and after fork the parent-child process is shared body segment (Code segment CS) After understand this reason! The specific principle is what, and let me slowly way!

The first thing to understand is that a C program's storage space layout, as shown in the following figure:

When a C program executes, it is loaded into memory, its layout in memory, as shown above, is divided into sections such as environment variables and command-line arguments, stacks, heaps, data segments (initialized and uninitialized), and body sections, each of which describes what these paragraphs represent:

environment variables and command-line arguments: these refer to the environment variables (such as $path) on the UNIX system and the parameters passed to the main function (the content pointed to by the argv pointer).

Data segment: This refers to the global variables defined in the C program, if not initialized, then stored in the uninitialized data segment, the program running unified by the exec assigned to 0. Otherwise, it is stored in the initialization data section, the program is run by exec unified from the program file read. (The friends who understand the Assembly must know that the data segment DS in assembly language, and the data section in the assembly is actually a thing).

Heap: This part is mainly used to dynamically allocate space. For example, in the C language application of the malloc space is applied in this area.

Body paragraph: C language code is not directly executed, but is compiled into machine instructions can be executed on the computer, the final generation of machine instructions is stored in this area (in the Assembly of the Code segment CS refers to this area).

Stack: Personal feeling this is the most critical part of the C program memory layout. This section is mainly used to make function calls. Specifically how to say, the program just started the stack only main this function content (that is, main stack frame), if the main function to call the Func function, then Func function return address (main function address), func function parameters, func functions defined in the local variables, The return value of the Func function and so on will be pushed into the stack, then the stack is more Func function content (func stack frame). Then the Func function after running the stack, the original pressure to remove the content (that is, clear off the func stack frame), at this time the stack is left only the main stack frame. (This area is the stack section SS in the Assembly)

OK, this is the memory layout of the C program. Here I think of another point, is that global variables and static variables are stored in the data segment, and local variables are stored in the stack, the stack of data in the function after the call after a stack is gone, which is why the survival of global variables than local variables of the survival cycle longer than the reason.

After understanding the C program in the storage layout, we come to understand the fork memory replication mechanism, about this, we only need to understand a word is enough, "child process copy the data space (data segment) of the parent process, stack and heap, parent, child process share body segment." "That is, for data in a program, the child process copies a copy, but for instructions, the subprocess is not replicated but is shared with the parent process." Take a look at the code below (which I added a little bit on the above code):

/* This program will create 3 subprocess, understand this sentence, the parent-child process to replicate data segments, stacks, heaps, Shared body Section
 * */
#include <unistd.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <stdarg.h>
#include < errno.h>
#define BUFSIZE
#define LEN 2
void Err_exit (char *fmt,...);
int main (int argc,char *argv[])
{
  pid_t pid;
  int loop;

  for (loop=0;loop<len;loop++)
  {
  printf (' Now is no.%d loop:\n ', loop);

  if ((Pid=fork ()) < 0)
    err_exit ("[fork:%d]:", loop);
  else if (PID = = 0)
  {
   printf ("[Child process]p:%d c:%d\n", Getpid (), Getppid ()); 
  }
  else
  {sleep
    (5);
  }
  }

  return 0;
}

Why does the above code create three sub processes? Let's make a concrete analysis of its implementation:

First the parent process executes a loop, creates a subprocess through fork, and then sleep5 seconds.

Then look at this subprocess created by the parent process, which we remember as subprocess 1. Child Process 1 completely replicates the data portion of this parent process, but it should be noted that its body segments are shared with the parent process. That is, the part of child process 1 that starts executing the code does not start with {starting with Main, but where the main function executes, and it executes, specifically, it executes the code behind the fork. So subprocess 1 first prints out its ID and the ID of its parent process. Then proceed to the second loop, and then the subprocess 1 again to create a subprocess, which we remember as subprocess 11, and the child process 1 begins to sleep.

Child Process 11 Then the code executed by subprocess 1 begins execution (that is, after fork), it also prints out its ID and parent process ID (subprocess 1), and then the value of the loop plus 1 equals 2, so the subprocess 2 returns directly.

After that subprocess was 1sleep and then the value of the loop added 1 to 2, so the sub process 1 also returned!

Then we go back to see the parent process, which only loops once, then comes back to the second loop after the sleep, and this time we create a subprocess that we remember as child process 2. Then the parent process begins to Sleep,sleep and ends.

So what about the sub process 2? It starts after the fork, when the loop equals 1, it prints its ID and the parent process ID, ends the loop, and the entire subprocess 2 ends directly!

This is the running process of the above code, and the relationship between the processes is shown in the following illustration:

The loop=%d in the diagram above is the value of the loop when the process begins to execute. The results of the above code are shown below:

The 3498 process here is our main process, 3499 is the subprocess 1,3500 is the subprocess 11,3501 is the subprocess 2.

Finally, let's answer the question we raised at the beginning, why the process part of the child process "if (PID = = 0)" And finally add a return 0, will create two subprocess, because the child process 1 running here directly to the end, no longer the second cycle, So it's not going to create that subprocess 11 anymore, so the last thing we did was create two sub processes!

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More