System Call: sys_fork

Source: Internet
Author: User

Fork introduction:

The fork system calls a function to create a child process for the current process. The child process is actually a copy of the parent process, but its pid Number is different from other few parameters. Fork system call is an important system call in process management, and is also a common system call in shell Command interpretation programs. If the call is successful, the fork system calls the process ID pid of the newly generated child process to the parent process, and returns 0 to the child process. Otherwise, the cause of the error is saved to the error variable, and returns-1 to the parent process. There are two error causes:

EAGAIN indicates that it is difficult for fork to allocate enough memory space for PCB data items of the child process. For example, when copying the parent process's page table, the memory application fails.

ENOMEM indicates that fork failed to apply for memory space for its own memory, or even insufficient memory for storing process control blocks.

2) Implementation of the fork Function

View "include/asm-i386/unistd. h", the fork system calls the corresponding function without parameters, the system call settings should be used macro:

Static inline _ syscall0 (int, fork)

In this way, when fork is called, the system will call the macro command _ syscall0, and then call the 0x80 interrupt. The value in the register eax is _ NR_fork, this is the only parameter that fork sends to int $0x80.

After the call interrupt "int $0x80", during the Assembly Process "system_call", the value of _ NR_fork in eax (that is, 2) the product of 4 is offset from the system call table (sys_call_table). Find the entry:

. Long SYMBOL_NAME (sys_fork)

As a result, the system flow is switched to the function sys_fork (). ("Arch/i386/kernel/process. c "):

Asmlinkage int sys_fork (struct pt_regs regs)

{

Return do_fork (SIGCHLD, regs. esp, & regs );

}

SIGCHLD is a defined macro that tells the do_fork () function to create a sub-process. As mentioned above, the macro process "SAVE_ALL" is called to save existing general-purpose registers, providing a way to pass parameters. Here, sys_fork () pass the regs of the stored register group structure "struct pt_regs" as a parameter to do_fork (), and pass the stack register item: regs. esp

Therefore, the system flow enters the function body do_fork () defined in the file "linux/kernel/fork. c (). The user process is created by the do_fork () function, which is also the executor of the fork system call. Do_fork () finds the idle position in the task array, inherits the existing resources of the parent process, and initializes the process clock, signal, time, and other data. The following section describes the general process of the function.

The approximate process of the do_fork () function

The do_fork () function "made the worst plan" at the beginning, and set the initial error value that may be returned to-ENOMEM, which tells the system that the memory has been used up. Then, enter the main process.

First, the do_fork () function calls kmalloc to request memory space for the process. GFP_KERNEL indicates that the memory can be transferred to sleep when the application fails. If the application fails, NULL is returned. At this time, the do_fork () function is transferred to bad_fork for execution. Here, the do_fork () function directly returns an error message, telling the system that the memory has been used up.

Then, the do_fork () function calls the alloc_kernel_stack () macro to apply for the stack page for the process. Similarly, if the application fails, execute the statement:

Goto bad_fork_free_p;

Here, it is necessary to take a look at the program segment after the bad_fork_free_p label:

Bad_fork_free_p:

Kfree (p );

Bad_fork:

Return error;

As you can see, as the initialization of the process gets deeper and deeper, once an error occurs, the number of replies that need to be done increases gradually. Therefore, in the reply section of the do_fork () function, an interesting and clever symmetry with the order of errors.

Then, execute the statement:

Error =-EAGAIN;

It indicates that the danger of "ENOMEM" has passed, and there is still the danger of "EAGAIN.

The following statement is followed:

* P = * current;

It assigns the content of the current process to the newly generated process. At this time, the child process completely inherits the content of the parent process and shares it with it. This is of course unreasonable, the next step is to make the sub-process have its own characteristics.

First, you must change the use_count value in the global execution Domain Structure of the process to indicate that the number of processes in the current domain has increased by 1. Similarly, you also need to change the use_count value in the global execution file format of the process.

Next, set parameters related to the new process:

1. p-> did_exec = 0, indicating that the process has not been executed;

2. p-> swappable = 0, indicating that the called memory is temporarily rejected because the process is being created;

3. p-> kernel_stack_page = new_stack; a physical page allocated for the core stack is placed into the kernel_stack_page data item;

4. set the Process status to TASK_UNINTERRUPTIBLE, indicating that the process will be placed in the waiting queue. Because the resources are not allocated properly, the process is set to non-interrupted so that it will be awakened when the resources are valid, other processes cannot wake up through signals;

5. p-> flags & = ~ (PF_PTRACED | PF_TRACESYS | PF_SUPERPRIV );

P-> flags | = PF_FORKNOEXEC;

These two statements indicate that the new process is denied to have the superuser privilege or be tracked, and the PF_FORKNOEXEC is set to a bit, indicating that the new process has not been executed;

6. "p-> pid = get_pid (clone_flags);" in the statement, the get_pid () function first checks whether the do_fork () that is called is used to call the clone system, obviously not here (for clone system calls, this is briefly introduced in section 2.4). Then, a process flag number not greater than 0x8000 is returned, it also makes a difference with the group ID and district ID;

7. Because the status of the newly generated process is still TASK_UNINTERRUPTIBLE, do not put it into the ready queue. Set next_run and prev_run to NULL. Assign the pointer to the original parent process and parent process to the Current process;

8. initialize the waiting queue for subsequent processes of the new process;

9. "p-> signal = 0;" indicates that the new process has not received any signal;

10. Initialization time data member:

Init_timer (& p-> real_timer );

P-> real_timer.data = (unsigned long) p;

These two statements initialize the real_timer of the timer_list type of the scheduled data structure of the process.

P-> it_real_value = p-> it_1__value = p-> it_prof_value = 0;

P-> it_real_incr = p-> it_1__incr = p-> it_prof_incr = 0;

The preceding two statements initialize the data items used for process timing and set them to 0. The it_real_value and it_real_incr are consistent with the system timing variable jiffies, which indicates the real time. it_effec_value, it_effec_incr is used for virtual software in a timely manner. It is only valid when the process is running. Therefore, this data item is used for intra-process timing and sends signals when the time arrives. The specific code can be found in the file "/kernel/sched. do_it_virt () function body in c:

If (it_virt <= ticks ){

It_virt = ticks + p-> it_rj_incr;

Send_sig (SIGVTALRM, p, 1);/* Send signal SIGVTALRM to the process */

It_prof_value and it_prof_incr are also used for virtual Software Timing, but they also include the time when the operating system runs for the process. The former is the time value, and the latter is the time increment. It signals SIGPROF when time arrives, which can be used to time the user's system for liquidation.

Process timing is used to control the running time of a process. It can be implemented by another system calling setimer. One of its parameters is the specified timing type, namely, ITIMER_REAL, ITIMER_VIRTUAL, and ITIMER_PROF.

P-> utime = p-> stime = 0;

P-> cutime = p-> cstime = 0;

.........................

P-> start_time = jiffies;

The three statements indicate that the time Sum of the Process User State, the time Sum of the Process core state, the time Sum of the sub-process user State, and the time Sum of the sub-process core State are set to 0, respectively, set the system time of the process to jiffies, and set the Creation Time of the current process.

11. the "SET_LINKS (p);" Statement associates the new process with the initial process. "task [nr] = p;" puts it into the array of all processes currently. "nr_tasks ++; "indicates that the current process has been added.

In the above 11 steps, all the parameters of the newly generated process are set. Now, the new process is allocated with the proper memory to save the file system and Memory Page related to the new process, the signal processing program and so on. Here, it is necessary to take a look at the processing functions called by the fork system:

Asmlinkage int sys_fork (struct pt_regs regs)

{

Return do_fork (SIGCHLD, regs. esp, & regs );

}

Here, the macro value of SIGCHLD is 17. Let's look at the "clone" Mark defined in "sched. h:

# Define CSIGNAL 0x000000ff/* message to be sent upon Process Termination */

# Define CLONE_VM 0x00000100/* sub-process sharing parent process virtual memory */

# Define CLONE_FS 0x00000200/* sub-process sharing parent process file system information */

# Define CLONE_FILES 0x00000400/* sub-process sharing parent process open file */

# Define CLONE_SIGHAND 0x00000800/* sub-process sharing parent process signal manipulation function */

# Define CLONE_PID 0x00001000/* sub-process shares the process ID of the parent process */

It can be seen that in clone_flags called by fork () systems, only CSIGNAL is non-zero. Therefore, sub-processes must have their own virtual operation structure.

Return to the do_fork () function and apply for transfer to the memory. The first application is used as an example:

If (copy_files (clone_flags, p ))

Goto bad_fork_cleanup;

That is, if the application fails, the returned non-zero value redirects the process to the bad_fork_cleanup mark. The application process is carried out in the function body copy_files. In this function, observe the statement:

Oldf = Current-> files;

If (clone_flags & clone_files ){

Oldf-> count ++;

Return 0;

}

Because clone_files does not have a set bit, it cannot be directly returned here. Therefore, you can only allocate the memory occupied by the structure of the file information for the new process:

Newf = kmalloc (sizeof (* newf), gfp_kernel );

Tsk-> files = newf;

If the application fails,-1 is returned. As described above, fork fails to apply for memory space for its data items. In this case, eagain is returned.

The next step is to copy the structure pointer of the parent process to open the file. In the process control block task_struct, A files_struct data item files is defined. Its structure is as follows:

Struct files_struct {

Int count;

Fd_set close_on_exec;

Fd_set open_fds;

Struct file * fd [NR_OPEN];

};

Here, count indicates the number of processes that share the file group. Therefore, if the initial value is set to zero, the data item is reduced by one every time the process ends; fd data item is a pointer to the file opened by this process. It is an array of "NR_OPEN", and "NR_OPEN" is in "limit. h "macro is defined as 256, so that a process can open up to 256 files.

After applying for the file_struct node, run the Code:

If (copy_fs (clone_flags, p ))

Goto bad_fork_cleanup_files;

Similarly, the copy_fs () function is transferred to copy the location of the parent process in VFS. In Linux, the process itself is also a file. The root of the fs_struct structure points to the root directory node, and PWD points to the current process working directory node. Count indicates the number of file references. The initial value is set to 1. umask indicates the default file creation mode. It inherits the parent process mode.

Here, we will briefly introduce the inode structure. In the Linux ext2 file system, inode is a basic file (or directory, but in the Linux system, the two are equivalent) description block. In general, it contains some key information about the file: device, type, size, time attribute, location on the device, user attribute, and so on.

Next, do_fork () executes the copy_sighand () function, which copies the structure of the parent process related to signal processing to the newly generated process.

Finally, execute the copy_mm () function, which opens a new page for the new process, and then copy the data item "mm" of all mm_struct structure types of the parent process to the child process to modify some feature parameters, for example, if the Count value is set to 0, def_flags is not set to any flag position (the def_flags flag stores information about the virtual memory referred to by the mm_struct structure, such as vm_locked ).

The following statement assigns a page table for the new process:

If (new_page_tables (TSK )){

Tsk-> MM = NULL;

Exit_mmap (mm );

Goto free_mm;

}

If the page table fails to be allocated, go to free_mm to release the bucket previously applied for by mm, and return the error message. If it succeeds, go to the dup_mmap () function to allocate the storage space of the vm_area_struct structure for the new process, and call "build_mmap_avl (mm);" to create an AVL Tree Structure for it. The vm_area_struct structure stores information about all opened virtual spaces of the process, and reports the information to the system through the statement "flush_tlb_mm (current-> mm, the storage structure starting from current-> mm has been changed and you need to reset the table.

In terms of inter-process resource sharing, Linux adopts a "Copy at write time" policy, that is, when one of the two parties tries to change the shared resources, the resources are copied to the other. The resources mentioned here refer to buckets.

Fork return work -- Return system_call

This section mainly works in the do_fork () function, but it is listed separately because it involves processing the returned results of system calls in the return entry. S.

The key step is:

Copy_thread (nr, clone_flags, usp, p, regs );

This function is in the file "/arch/i386/kernel/process. as defined in c, this function may only be used to set the TSS (Task State Segment) of the process, but the following statements are worth noting:

Childregs = (struct pt_regs *) (p-> kernel_stack_page + PAGE_SIZE)-1;

P-> tss. esp = (unsigned long) childregs;

These two statements direct the stack segment of the sub-process to the newly opened stack.

P-> tss. eip = (unsigned long) ret_from_sys_call;

* Childregs = * regs;

Childregs-> eax = 0;

The eip obtains the ret_from_sys_call entry address. After the sub-process is awakened, The ret_from_sys_call is executed. If eax is set to zero, the sub-process is successfully created and 0 is returned.

Last run:

1. "p-> swappable = 1;", note that do_fork () is set to zero at the beginning, and is now set to a bit.

2. "p-> exit_signal = clone_flags & CSIGNAL;" puts the signal SIGCHLD passed by the parent process into exit_signal for sending when it is forcibly terminated (note that the lower eight bits of CSIGNAL is ff, see Section 2.3)

3. "P-> counter = (current-> counter> = 1);": This statement sets the time slice of the child process as half of the parent process, reflecting a difference. 4. "wake_up_process (p);" wakes up the new process into the ready queue, waits for scheduling, and returns.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.