Research on how to implement fork in Linux system (II.) "Turn"

Source: Internet
Author: User
Tags goto signal handler

Tag: blank user State IFD CAL executes KDE memory structure unlock Register

Transferred from: http://www.aichengxu.com/linux/7166015.htm

This article is original, reprint please specify: http://www.cnblogs.com/tolimit/
Introduction

Previous study on how to implement fork in Linux System (a) The code has explained how to invoke the Clone function from the user state through the soft interrupt implementation, and the essence of the Clone function Copy_process function is analyzed in this article. We know that in a Linux system, the application layer can create sub-processes and sub-threads (lightweight processes) in two program branching structures. And for the Linux kernel, however, the difference between the molecular processes and the sub-threads (lightweight processes) is not detailed, they all use the TASK_STRUCT structure (which is extremely complex and contains very many data structures), and the difference is that the task_struct initialization results of the child process and the child thread are different. The TASK_STRUCT structure is the identification of a process or thread and the credentials that exist, and the scheduler is the TASK_STRUCT structure that distinguishes between different processes (threads). It contains all the necessary structures (memory descriptors, file descriptors, signal descriptors, signal processing functions, scheduling priorities, etc.) for the process (thread). And we know that a process (thread) has not only its own task_struck structure, but also a kernel stack of its own, and when the process is switched on, some of the process contexts are stored in the kernel stack of their processes, and the interrupt context at the time of the outage is saved in the ongoing process kernel stack. The initialization of the kernel stack in the Copy_process function causes the two return values of fork () to be different (described later). Of course, Copy_process also involves many operations, such as security detection of new processes (threads), PID allocation, relationship tuning (parent-child processes, process group relationships, namespace relationships, etc.), initialization of memory structures, and so on, which we slowly come up with later in the code.


Copy_process

1/* Code directory: Linux Source/KERNEL/FORK.C */2 3 static struct task_struct *copy_process (unsigned long clone_flags, 4 unsigned long stack_start, 5 unsigned long stack_size, 6 int __user *c Hild_tidptr, 7 struct PID *pid, 8 int trace) 9 {ten int retval; stru CT task_struct *p; */* CLONE_FS cannot be set at the same time as clone_newns or Clone_newuser (Clone_flags & (clone_newns| CLONE_FS) = = (clone_newns| CLONE_FS)) err_ptr (-einval); if (Clone_flags & (clone_newuser| CLONE_FS) = = (clone_newuser| CLONE_FS)) Err_ptr return (-einval); 19 20/* The signal processing function is shared between threads when creating threads */if ((Clone_flags & Clone_thread) &&! ( Clone_flags & Clone_sighand)) (err_ptr) (-einval);     23 24/* 25 * The parent-child process must share the memory address space when sharing a signal handler 26 * That's why the fork-and-father process written on the book has its own signal processing function, because their memory address space is different 27 */28 if (Clone_flags &Clone_sighand) &&! (Clone_flags & CLONE_VM)) Return Err_ptr (-einval); 30 31/* 32 * Prevent parameter init process sibling Process 33 * Only the Init process's Signal->flags & Signal_unkillable is true 34 * because when the process exits It's actually a zombie process (zombie), and it's going to be recycled through the init process, and if the process is the init sibling process, then it's not possible to recycle the * * * (Clone_flags & clone_parent) && Amp PNS Current->signal->flags & signal_unkillable) err_ptr (-einval); 39 40/* If the new process will have a new user space or PID, you cannot let it share the parent process's thread group or signal processing or parent process */if (Clone_flags & Clone_sighand) { F (clone_flags & (Clone_newuser | clone_newpid)) | |             (Task_active_pid_ns (current) = Current->nsproxy->pid_ns_for_children)) 45 Return Err_ptr (-einval); 46} 47 48 * * Additional security check */retval = Security_task_create (clone_flags); if (retval) Wuyi goto fork_out; retval =-enomem; 54/* Assigns a struct task_struct memory and kernel stack memory to the new process * * p = dup_task_struct (current); if (!p) Fork_out Goto; The Ftrace/* is for kernel performance analysis and Tracking */Ftrace_graph_init_task (P);  */* Futex initialized for system V IPC, specifically visiblehttp://blog.chinaunix.net/uid-7295895-id-3011238.html   * * Rt_mutex_init_task (P); #ifdef config_prove_locking debug_locks_warn_on (!p->hardirqs_enabled); DEBUG_LOCKS_WARN_ON (!p->softirqs_enabled); #endif retval =-eagain; 70/* Checks if tsk->signal->rlim[rlimit_nproc].rlim_cur is less than or equal to the number of processes owned by the user, and the Rlim struct represents the maximum value of the associated resource * * (Atomic_read (&A mp;p->real_cred->user->processes) >= Task_rlimit (P, Rlimit_nproc)) {* * * Init_user is the root privilege. Check if the parent process has root privileges */if (p->real_cred->user! = Init_user &&!capable (cap_sys_resource) &&!c Apable (cap_sys_admin)); goto Bad_fork_free; Current->flags &= ~pf_nproc_exceeded; 77 78/* Copies the cred of the parent process to the real_cred and cred of the child process. struct CRED structure for safe operation */retval = Copy_creds (P, clone_flags); if (retval < 0) Bayi Goto bad_fork_free; retval =-eagain; 84/* Whether the number of processes exceeds the maximum number of processes allowed by the system, the maximum number of processes is memory-related, and the general principle is that all process kernel stacks (default 8K) add up to no more than 1/8 of total memory, which can be overridden by/proc/sys/kernel/threads-max */85    if (nr_threads >= max_threads), goto Bad_fork_cleanup_count; 87 88/* If the kernel functions that implement the new process's execution domain and executable format are included in the kernel module, increment its usage count */if (!try_module_get (Task_thread_info (p)->exec_domain-& gt;module)) Bad_fork_cleanup_count Goto;    Delayacct_tsk_init (P); /* must remain after dup_task_struct () */93 94/* Clear Pf_superpriv (indicates the process uses Superuser rights) and Pf_wq_worker (Work queue used) */P ->flags &= ~ (Pf_superpriv | Pf_wq_worker); 96/* Setting PF_FORKNOEXEC indicates that this subprocess has not yet been EXECVE () system call */P->flags |= pf_forknoexec; 98 99/* Initializes child process list of child processes and sibling process list is empty */100 init_list_head (&p->children); 101 Init_list_head (&p->siblin  g); 102/* Seehttp://www.ibm.com/developerworks/cn/linux/l-rcu/  */103 rcu_copy_process (p); 104 P->vfork_done = null;105/* Initialize the allocation lock, which is used to protect the allocated memory, files, file system, etc. */106 Spin_lock     _init (&p->alloc_lock); 107 108///Signal list initialization, this list holds the suspended signal */109 init_sigpending (&p->pending); 110 111 /* Code Execution time variable is set to 0 */112 p->utime = p->stime = P->gtime = 0;113 p->utimescaled = p->stimescaled = 0;11 4 #ifndef config_virt_cpu_accounting_native115 p->prev_cputime.utime = P->prev_cputime.stime = 0;116 #endif117 # Ifdef config_virt_cpu_accounting_gen118 Seqlock_init (&p->vtime_seqlock); 119 P->vtime_snap = 0;120 p >vtime_snap_whence = vtime_sleeping;121 #endif122 123 #if defined (split_rss_counting) 124 memset (&p->rss_sta T, 0, sizeof (p->rss_stat)); #endif126/* This variable is typically used for Epoll and select, copied from the parent process */127 P->default_timer_slack_ns = C URRENT-&GT;TIMER_SLACK_NS;128 129/* Initialize process IO count structure */130 task_io_accounting_init (&AMP;P-&GT;IOAC); 131 acct_clear    _integrals (p); 132 133 /* Initialize Cputime_expires structure */134 posix_cpu_timers_init (p); 135 136/* Set process creation time */137 p->start_time = Ktime_get _ns (); 138 p->real_start_time = Ktime_get_boot_ns (); 139/* Io_context and Audit_context empty */141 P->io_  Context = null;142 P->audit_context = null;143/* If you are creating a thread, because you need to modify the descriptor to the current process, you will be locked */144 if (Clone_flags & Clone_thread) 145 Threadgroup_change_begin (current) 146 cgroup_fork (p); 147 #ifdef config_numa148 P->mem          Policy = Mpol_dup (p->mempolicy); 149 if (Is_err (P->mempolicy)) {retval = Ptr_err (p->mempolicy); 151 P->mempolicy = null;152 goto bad_fork_cleanup_threadgroup_lock;153}154 #endif155 #ifdef CONFIG_CP USETS156 P->cpuset_mem_spread_rotor = numa_no_node;157 p->cpuset_slab_spread_rotor = NUMA_NO_NODE;158 SE Qcount_init (&AMP;P-&GT;MEMS_ALLOWED_SEQ) 159 #endif160 #ifdef config_trace_irqflags161 p->irq_events = 0;162 p >hardirqs_enabled = 0; 163 P->hardirq_enable_ip = 0;164 P->hardirq_enable_event = 0;165 p->hardirq_disable_ip = _THIS_IP_;1     P->hardirq_disable_event = 0;167 p->softirqs_enabled = 1;168 p->softirq_enable_ip = _THIS_IP_;169 P->softirq_enable_event = 0;170 P->softirq_disable_ip = 0;171 p->softirq_disable_event = 0;172 p  >hardirq_context = 0;173 P->softirq_context = 0;174 #endif175 #ifdef config_lockdep176 p->lockdep_depth = 0; /* No locks held yet */177 P->curr_chain_key = 0;178 p->lockdep_recursion = 0;179 #endif180 181 #ifdef CONFI g_debug_mutexes182 p->blocked_on = NULL;    /* Not blocked yet */183 #endif184 #ifdef config_bcache185 p->sequential_io = 0;186 p->sequential_io_avg = 0;187 #endif188 189 190/* Initializes the scheduling priority and policy for the child process, where this process is not joined to the run queue, and is joined after Copy_process returns */191 retval = Sched_fork (Clone_flags, p); 192 if (retval) 193 goto bad_fork_cleanup_policy;194 195/* PERF event is a performance tuning tool, as described in  http://blog.sina.com.cn/s/blog_98822316010122ex.html   */196 retval = Perf_event_init_task (p); 197 if (retval) 198 goto bad_fork_cleanup_policy;199 retval = a     Udit_alloc (P); if (retval) 201 goto bad_fork_cleanup_perf;202/* Initialize P->sysvshm.shm_clist chain header */203  Shm_init_task (P); 204 205/* Copy_semundo, Copy_files, Copy_fs, Copy_sighand, Copy_signal, copy_mm, Copy_namespaces, Copy_io are based on Clone_flags from the parent process to make the corresponding replication */206 retval = Copy_semundo (Clone_flags, p); 207 if (retval) 208 goto Bad_ fork_cleanup_audit;209 retval = Copy_files (Clone_flags, p); if (retval) 211 goto Bad_fork_cleanup_semund o;212 retval = Copy_fs (Clone_flags, p); 213 if (retval) 214 goto bad_fork_cleanup_files;215/* Determine if CL is set One_sighand, if yes (thread must be), increase the Sighand reference count of the parent, and if no (must be a child process), copy the sighand_struct of the parent thread into the child process */216 retval = Copy_sighand ( Clone_flags, p); 217 if (retval) 218 Goto bad_fork_cleanup_fs;219/* If a thread is created, returns 0 directly, if a process is created, the signal of the parent process is masked and */220 retval = cop in child processY_signal (Clone_flags, p); 221 if (retval) 222 goto bad_fork_cleanup_sighand;223/* 224 * If it is a process, the parent process's mm The _STRUCT structure is copied into the child process, and then modifies the information in which the child process is different from the parent process (such as the page directory) 225 * If it is a thread, the child thread's mm pointer and active_mm pointer are pointed to the structure of the parent process's mm pointer. 226 */227 retval = copy_mm (Clone_flags, p); 228 if (retval) 229 Goto bad_fork_cleanup_signal;230 RE Tval = Copy_namespaces (Clone_flags, p); 231 if (retval) 232 goto bad_fork_cleanup_mm;233 retval = Copy_io (cl One_flags, p); 234 if (retval) 235 goto bad_fork_cleanup_namespaces;236 237/* 238 * Initialize child process kernel stacks and thre AD_STRUCT Structure 239 * When the process switches, the hardware context of the process is generally saved in three places: tss_struct (save process kernel stack address, I/o permission bit), thread_struct (most non-universal registers), process kernel stack (Universal register) The Copy_thread function copies the parent process's thread_struct and kernel stack data to the child process and resets the child process's return value to 0 (the x86 return value is saved in eax, and arm is saved in R0, which is the kernel stack data that eax or R0 is set to 0) The 241 * Copy_thread function also sets the EIP register value of the child process to the address of Ret_from_fork (), which is the immediate execution of the system call clone returned when the child process is first called. 242 * So after the application layer calls the fork () function, the child process returns 0, and the parent process returns the child process ID (the return child process ID is implemented in the code later) 243 */244 retval = Copy_thread (Clone_flags, Stack_start, Stack_size, p); 245 if (retval) 246 goto bad_fork_cleanup_io;247 248/* judgment is not the init process */249 if (pid! = &init_struct_pid) {$ retval =-enomem;251/* Assign PID */252 PID = Alloc_pid (P->nsproxy->pid_ns_for_children); 253 if (!pid) 254 Goto bad_fork_cleanup_io;255}  256 257/* If Clone_child_settid is set, the Set_child_tid in task_struct points to the child_tidptr of the user space, otherwise the empty */258 P->set_child_tid = (Clone_flags & Clone_child_settid)? child_tidptr:null;259/* If Clone_child_cleartid is set, the Clear_child_tid in task_struct points to the child_tidptr of the user space, otherwise the empty */260 p ->clear_child_tid = (Clone_flags & clone_child_cleartid)? child_tidptr:null;261 262 #ifdef config_block263 p->plug = null;264 #endif265 #ifdef config_futex266 P-&G T;robust_list = null;267 #ifdef config_compat268 p->compat_robust_list = null;269 #endif270 init_list_head (&amp ;p->pi_state_list); 271 P->pi_state_cache = null;272 #endif273/*274 * If a shared VM or vfork is created, the signal stack is emptied 275 */276 if (Clone_flags & (Clone_v m|      clone_vfork)) = = CLONE_VM) 277 p->sas_ss_sp = P->sas_ss_size = 0;278 279/*280 * System call tracking should prohibit stepping 281 */282 User_disable_single_step (p); 283 Clear_tsk_thread_flag (P, tif_syscall_trace); 284 #ifdef TIF_SYSCALL_EMU2 Clear_tsk_thread_flag (P, Tif_syscall_emu); 286 #endif287 clear_all_latency_tracing (p); 288 289 290/* The PI of the child process D is set to the value assigned by the PID in the global namespace, the PID of the process is different in different namespace, and the p->pid is saved in the global namespace the assigned PID */291 p->pid = PID_NR (PID) ; 292 if (Clone_flags & clone_thread) {293/* Creates a thread */294 p->exit_signal = -1;295 */thread All threads of the group are group_leader consistent */296 P->group_leader = current->group_leader;297/* All threads of the thread group are consistent tgid, using GETPI D returns the Tgid */298 p->tgid = current->tgid;299} else {300/* created is a child process */301 if (clone_flag  S & clone_parent) 302           p->exit_signal = current->group_leader->exit_signal;303 else304 p->exit_signal         = (Clone_flags & csignal); 305 P->group_leader = p;306/* Tgid is consistent with PID, so Tgid is consistent with the main thread when creating sub-threads */307 P->tgid = p->pid;308}309 310/* Initialize page box The number of dirty pages is 0 */311 p->nr_dirtied = 0;312/* Initialize the number of dirty pages critical value, when dirty pages When the amount reaches the threshold, Balance_dirty_pages () is called to write the dirty page to disk */313 P->nr_dirtied_pause = >> (page_shift-10); 314/* writes dirty pages to Disk start time */315 p->dirty_paused_when = 0;316 317 p->pdeath_signal = 0;318/* Initialize thread group list empty */319 Init_lis T_head (&p->thread_group); p->task_works = null;321 322 323 * * This process (thread) already exists in this system, but it is not yet able to execute and needs to wait for the parent process to process it. This will lock */324 WRITE_LOCK_IRQ (&tasklist_lock); 325 326 if (Clone_flags & clone_parent| Clone_thread) {327/* Creates a sibling process or the same thread group thread */328/* whose parent process is the parent process of the parent process */329 p->real_parent = current-& GT;REAL_PARENT;330/* Its parent Process execution domain is the parent process of the parent process executionDomain */331 p->parent_exec_id = current->parent_exec_id;332} else {333/* created is a child process */334/* The parent process is the parent process */335 p->real_parent = current;336/* The execution domain of the parent process is the execution domain of the parent process */337 p->parent_exec_id = Curr ent->self_exec_id;338}339 340/* Current process signal processing lockout, here should be banned signal processing */341 spin_lock (¤t->sighand->siglock); 342 3  /*344 * Seccomp is related to system safety, seehttp://note.sdo.com/u/634687868481358385/notecontent/m5cen  ~kkf9bfnm4og00239345 */346 Copy_seccomp (p); 347 348/*349 * Before fork, both the process group and the session signal need to be sent to the father's node, and after the fork, these signals need to be sent to the father. And your child's knot. 350 * If we have a signal in the process of adding a new process to a process group, and the pending signal will cause the current process to exit, our subprocess will not be able to kill or exit 351 * So there is no signal being suspended from the parent process. 352 */353 recalc_sigpending () 354 if (signal_pending (current)) {355/* contains a pending process, error */356 Spin_ Unlock (¤t->sighand->siglock); 357 WRITE_UNLOCK_IRQ (&tasklist_lock); 358 retval =-erestartnointr;3 }361 Goto bad_fork_free_pid;360 362 if (likely (P->pid)) {363/* If the child process needs to be traced, the Current->par  ENT assigns a value to tsk->parent and inserts the child process into the debugger's tracking list */364 ptrace_init_task (P, (Clone_flags & clone_ptrace) | | trace); 365 366/* P->pids[pidtype_pid].pid = PID; */367 init_task_pid (P, pidtype_pid, PID); 368 369/* If the child process (in fact, is to determine whether p->exit_signal is greater than or equal to 0, the creation of a thread, Exit_ The value of signal is-1) */370 if (Thread_group_leader (p)) {371/* P->pids[pidtype_pgid].pid = current->group_leader->pids[pidtype_pgid].pid; Pgid is the process group ID, so the pgid of the parent process is copied directly */372 Init_task_pid (p, Pidtype_pgid, TASK_PGRP (current)); 373/* P-&GT;PI Ds[pidtype_sid].pid = current->group_leader->pids[pidtype_sid].pid;             The SID is the conversation group ID, and when Setsid () is not used, the SID of the child process is consistent with the parent process */374 init_task_pid (p, Pidtype_sid, task_session (current)); 375 376 /* return pid->numbers[pid->level].nr = = 1;                 Determine if the new process is in a newly created namespace (the PID in the new namespace in which the new process is located will be 1) */377 if (Is_child_reaper (PID)) {378 /* Set the current namespace init process to this new process */379 ns_of_pid (PID)->child_reaper = p;380 p->signal-& Gt;flags |= signal_unkillable;381}382 383 p->signal->leader_pid = pid;384 P-&gt ; signal->tty = Tty_kref_get (current->signal->tty); 385 386/* Adds this process to the parent process's child processes list */387 List_      Add_tail (&p->sibling, &p->real_parent->children); 388       /* Add this process task_struct to the task List */389 list_add_tail_rcu (&p->tasks, &init_task.tasks); 390 /* Insert the PGID structure of the new process descriptor into the Pgid_hash */391 attach_pid (P, pidtype_pgid); 392/* Inserts the SID structure of the new process descriptor into the Sid_has H */393 attach_pid (P, pidtype_sid); 394/* Current number of CPU processes plus 1 */395 __this_cpu_inc (Process_cou NTS); 396} else {397/* Creates a thread, where processing causes the thread to share the signal */398 current->signal->nr_threads++;3 Atomic_inc (¤t->signal->live); Atomic_inc (¤t->signal->sigcnt); 401/*                       Add the Thread_group node of the new thread to the Thread_group list of the thread group's lead thread */402 List_add_tail_rcu (&p->thread_group,403 &p->group_leader->thread_group); 404/* Add the new thread's Thread_node node to the signal->thread_head of the new thread *          /405 List_add_tail_rcu (&p->thread_node,406 &p->signal->thread_head); 407 }408/* Insert the PID structure of the new process descriptor into the Pid_hash */409 attach_pid (P, pidtype_pid); 410/* Current system processes plus 1 */411 nr_threads++;412 }413 414/* Number of processes created plus 1 */415 total_forks++;416/* Release current process signal processing lock */417 Spin_unlock (¤t->sighand->sig Lock); 418 syscall_tracepoint_update (P); 419/* Release Tasklist_lock lock */420 WRITE_UNLOCK_IRQ (&tasklist_lock); 421     422/* Associates the new process with the proc file system */423 Proc_fork_connector (p); 424 cgroup_post_fork (P); 425/* If you are creating a thread, release this lock */426 if (Clone_flags & Clone_thread) 427 Threadgroup_change_end (current); 428 perf_event_fork (P); 429 430 T     Race_task_newtask (P, clone_flags); 431 uprobe_copy_process (P, clone_flags); 432 433/* Returns the TASK_STRUCT structure of the new process */434 return p;435 436/* Following error handling during execution */437 bad_fork_free_pid:438 if (pid! = &init_struct_pid) 439 free_pid (PID); bad_fork_cleanup_io:441 if (p->io_context) 442 exit_io_context (p); 443 bad_fork_cleanup_namespaces: 444 Exit_task_namespAces (p); 445 bad_fork_cleanup_mm:446 if (p->mm) 447 mmput (p->mm); 448 bad_fork_cleanup_signal:449 if (! (Clone_flags & Clone_thread)) Free_signal_struct (p->signal); 451 bad_fork_cleanup_sighand:452 __cleanup_sighand (p->sighand); 453 Bad _fork_cleanup_fs:454 Exit_fs (P); /* Blocking */455 bad_fork_cleanup_files:456 exit_files (p); /* Blocking */457 bad_fork_cleanup_semundo:458 Exit_sem (p); 459 bad_fork_cleanup_audit:460 Audit_free (p); 461 BAD_FO rk_cleanup_perf:462 Perf_event_free_task (p); 463 bad_fork_cleanup_policy:464 #ifdef config_numa465 mpol_put (p->m Empolicy); 466 bad_fork_cleanup_threadgroup_lock:467 #endif468 if (Clone_flags & clone_thread) 469 Threadgro Up_change_end (current); 470 Delayacct_tsk_free (P); 471 module_put (Task_thread_info (p)->exec_domain->module); 472 bad_fork_cleanup_count:473 Atomic_dec (&p->cred->user->processes); 474 exit_creds (P); 475 Bad_fork_f ree:476 FREe_task (p); 477 fork_out:478 return err_ptr (retval); 479} 



Flow chart



Summary

Copy_process as Do_fork backbone, the process is not complex, but each step of the initialization function call is very subtle, involving a lot of knowledge and code, here for the sake of space will not continue to the details of the analysis, will be in the subsequent article slowly fill in the knowledge and their understanding. Read the whole article, in fact, the core of copy_process is to initialize the TASK_STRUCT structure for the new process (thread) to use, and assign it a unique PID, and finally added to the run queue. And as to why the application layer call fork () will be returned two times, the principle is in the kernel stack, the Copy_thread function in the parent process to copy its kernel stack to the child process, the execution of the process is dispatched after the first statement is set to Do_fork () return, and the value of the store to save the return value of the register ( The general return value is saved in EAX (Arm is r0), and these common register values are stored in the kernel stack, when the call will be a process switch, will be saved in the kernel stack register value restored to the register) is set to 0, so the child process return value is 0, and the parent process will continue to execute Copy_ After the thread function is initialized, it returns the PID of the subprocess (actually tgid).

Research on how to implement fork in Linux system (II.) "Turn"

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.