Linux Kernel namespace Mechanism Analysis

Source: Internet
Author: User

Linux Kernel namespace Mechanism Analysis

1. Linux kernel namespace Mechanism

The Linux Namespaces Mechanism provides a resource isolation solution. PID, IPC, Network and other system resources are not global, but belong to a specific Namespace. Resources in each namespace are transparent and invisible to resources in other namespaces. Therefore, at the operating system level, there will be multiple processes with the same pid. Two processes with the numbers 0, 1 and 2 can exist in the system at the same time. Because they belong to different namespaces, they do not conflict with each other. At the user level, you can only view resources under your own namespace. For example, you can only list processes under your own namespace by using the ps command. In this way, each namespace looks like a Linux system.

2. namespace struct in Linux Kernel

Multiple namespaces are provided in the Linux kernel, including fs (mount), uts, network, sysvipc, and so on. A process can belong to multiple namesapce. Since namespace is related to the process, the task_struct struct contains the variables associated with the namespace. In the task_struct structure, there is a pointer to the namespace struct nsproxy.

Struct task_struct {

........

/* Namespaces */

Struct nsproxy * nsproxy;

.......

}

Let's take a look at how nsproxy is defined in include/linux/nsproxy. in the H file, five namespaces are defined here, and five pointers pointing to various types of namespaces are defined in this struct, since multiple processes can use the same namespace, nsproxy can be shared. The count field indicates the reference count of the structure.

/* 'Count' is the number of tasks holding a reference.

* The count for each namespace, then, will be the number

* Of nsproxies pointing to it, not the number of tasks.

* The nsproxy is shared by tasks which share all namespaces.

* As soon as a single namespace is cloned or unshared,

* Nsproxy is copied

*/

Struct nsproxy {

Atomic_t count;

Struct uts_namespace * uts_ns;

Struct ipc_namespace * ipc_ns;

Struct mnt_namespace * mnt_ns;

Struct pid_namespace * pid_ns_for_children;

Struct net * net_ns;

};

(1) The UTS namespace contains information such as the name, version, and underlying architecture type of the running kernel. UTS is short for UNIX Timesharing System.

(2) All information stored in struct ipc_namespace related to inter-process communication (IPC.

(3) view of the mounted file system, which is provided in struct mnt_namespace.

(4) Information about the process ID is provided by struct pid_namespace.

(5) struct net_ns contains all network-related namespace parameters.

The system has a default nsproxy, init_nsproxy, which is also initialized during task initialization. # Define INIT_TASK (tsk )\

{

. Nsproxy = & init_nsproxy,

}

Init_nsproxy is defined:

Static struct kmem_cache * nsproxy_cachohhot;

 

Struct nsproxy init_nsproxy = {

. Count = ATOMIC_INIT (1 ),

. Uts_ns = & init_uts_ns,

# If defined (CONFIG_POSIX_MQUEUE) | defined (CONFIG_SYSVIPC)

. Ipc_ns = & init_ipc_ns,

# Endif

. Mnt_ns = NULL,

. Pid_ns_for_children = & init_pid_ns,

# Ifdef CONFIG_NET

. Net_ns = & init_net,

# Endif

};

Mnt_ns is not initialized, and other namespaces are initialized by default.
 

3. Use clone to create your own Namespace

If you want to create your own namespace, you can use the system to call clone (). Its prototype in the user space is

Int clone (int (* fn) (void *), void * child_stack, int flags, void * arg)

Fn is the function pointer. This is the pointer to the function. child_stack allocates system stack space for sub-processes, and flags is the identifier used to describe the resources that you need to inherit from the parent process, arg is the parameter passed to the sub-process, that is, the function parameter pointed to by fn. The following is the value that flags can fetch. Only parameters related to namespace are concerned here.

The CLONE_FS sub-process shares the same file system with the parent process, including root, current directory, and umask

CLONE_NEWNS set this flag when you need your own namespace for clone. CLONE_NEWS and CLONE_FS cannot be set at the same time.

The Clone () function is an encapsulation function defined in the libc library. It is used to create a new stack of lightweight processes and hide clone system entries for programmers. The sys_clone () service routine called by the clone () system does not have the fn and arg parameters. The encapsulation function stores the fn pointer in each position of the sub-process stack, which is the location where the returned address of the encapsulation function is stored. The Arg pointer is stored under the fn in the sub-process stack. When the encapsulation function ends, the CPU extracts the return address from the stack and then executes the fn (arg) function.

/* Prototype for the glibc wrapper function */

# Include <sched. h>

Int clone (int (* fn) (void *), void * child_stack,

Int flags, void * arg ,...

/* Pid_t * ptid, struct user_desc * tls, pid_t * ctid */);

/* Prototype for the raw system call */

Long clone (unsigned long flags, void * child_stack,

Void * ptid, void * ctid,

Struct pt_regs * regs );

The implementation functions we see in the Linux kernel are encapsulated by the libc library and fork in the Linux kernel. the c file contains the following definitions, and all the final calls are the do_fork () function.

# Ifdef _ ARCH_WANT_SYS_CLONE

# Ifdef CONFIG_CLONE_BACKWARDS

SYSCALL_DEFINE5 (clone, unsigned long, clone_flags, unsigned long, newsp,

Int _ user *, parent_tidptr,

Int, tls_val,

Int _ user *, child_tidptr)

# Elif defined (CONFIG_CLONE_BACKWARDS2)

SYSCALL_DEFINE5 (clone, unsigned long, newsp, unsigned long, clone_flags,

Int _ user *, parent_tidptr,

Int _ user *, child_tidptr,

Int, tls_val)

# Elif defined (CONFIG_CLONE_BACKWARDS3)

SYSCALL_DEFINE6 (clone, unsigned long, clone_flags, unsigned long, newsp,

Int, stack_size,

Int _ user *, parent_tidptr,

Int _ user *, child_tidptr,

Int, tls_val)

# Else

SYSCALL_DEFINE5 (clone, unsigned long, clone_flags, unsigned long, newsp,

Int _ user *, parent_tidptr,

Int _ user *, child_tidptr,

Int, tls_val)

# Endif

{

Return do_fork (clone_flags, newsp, 0, parent_tidptr, child_tidptr );

}

# Endif

3.1 do_fork Function

Call the do_fork function in the clone () function for real processing, and call the copy_process process process in the do_fork function for processing.

Long do_fork (unsigned long clone_flags,

Unsigned long stack_start,

Unsigned long stack_size,

Int _ user * parent_tidptr,

Int _ user * child_tidptr)

{

Struct task_struct * p;

Int trace = 0;

Long nr;

 

/*

* Determine whether and which event to report to ptracer. When

* Called from kernel_thread or CLONE_UNTRACED is explicitly

* Requested, no event is reported; otherwise, report if the event

* For the type of forking is enabled.

*/

If (! (Clone_flags & CLONE_UNTRACED )){

If (clone_flags & CLONE_VFORK)

Trace = PTRACE_EVENT_VFORK;

Else if (clone_flags & CSIGNAL )! = SIGCHLD)

Trace = PTRACE_EVENT_CLONE;

Else

Trace = PTRACE_EVENT_FORK;

 

If (likely (! Ptrace_event_enabled (current, trace )))

Trace = 0;

}

 

P = copy_process (clone_flags, stack_start, stack_size,

Child_tidptr, NULL, trace );

/*

* Do this prior waking up the new thread-the thread pointer

* Might get invalid after that point, if the thread exits quickly.

*/

If (! IS_ERR (p )){

Struct completion vfork;

Struct pid * pid;

 

Trace_sched_process_fork (current, p );

 

Pid = get_task_pid (p, PIDTYPE_PID );

Nr = pid_vnr (pid );

 

If (clone_flags & CLONE_PARENT_SETTID)

Put_user (nr, parent_tidptr );

 

If (clone_flags & CLONE_VFORK ){

P-> vfork_done = & vfork;

Init_completion (& vfork );

Get_task_struct (p );

}

 

Wake_up_new_task (p );

 

/* Forking complete and child started to run, tell ptracer */

If (unlikely (trace ))

Ptrace_event_pid (trace, pid );

 

If (clone_flags & CLONE_VFORK ){

If (! Wait_for_vfork_done (p, & vfork ))

Ptrace_event_pid (PTRACE_EVENT_VFORK_DONE, pid );

}

 

Put_pid (pid );

} Else {

Nr = PTR_ERR (p );

}

Return nr;

}

This article permanently updates the link address:

  • 1
  • 2
  • Next Page

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.