This article first describes some thread foundations, such as concurrency, parallelism, memory allocation, system invocation, and POSIX threading. The difference between threads and processes is then analyzed by Strace. Finally, the threading models such as Android and Golang are analyzed.
Base 1. What is concurrency (Concurrent) and what is parallel (Parallels)?
Concurrency refers to multiple compute tasks at the same time.
Parallel refers to multiple computing tasks by switching the time slice simulation.
For details, refer to difference between concurrent programming and parallel programming-stackoverflow
2. Memory allocation, user area and kernel area under OS
In a 32-bit Linux operating system, when a process is started, it will be allocated 4G of virtual memory. Memory can be divided into two spaces, one is User space (0~3g) and the other is kernel space (3g~4g). Where the user space is the code to run the space, such as Stack, BSS (uninitialized data segment), data (already initialized segment), TEXT (code binary segment), and in kernel space, is the OS kernel mapping, can only be rewritten when the Syscall system call is executed.
A-Bit OS Virtual Memory
In the user state, execute user code, such as running a C program directly, or running a JVM virtual machine.
In the kernel, mainly responsible for I/O (display, layer three below the network, FS), memory (virtual memory, page replacement/cache), Process (signal, thread/process management, CPU scheduling) management, direct control of CPU, memory and other hardware, permissions (privilege) is very large;
3. System call Interruption (SCI)
The system call is a stub between the user and the kernel, and when a high-privilege task is performed in the user state, it needs to switch into the kernel state through the system call to perform the lowest level task. For example, when called in C getTime()
, the approximate flow is as follows
1. app method(User Application) | |调用stdlibc标准库 |2. systemcall_stub(std libc) | |系统调用,进入内核态 |3. system_call_table[call_number](Kernel) | |通过查表调用硬件函数 |4. hardware_call(Kernel)
- At the app level, developers do not need to write system calls themselves, the system will provide the relevant C standard library SDK for developers to use, such as when the developer calls
getTime()
, the actual use of the standard library time.h
header files.
- When the code executes, the OS automatically loads the standard library. For example, in the Android Bionic library, the actual execution of gettime system call is the platform-related assembly code here, the system calls the ID, parameters passed into the kernel.
- The kernel uses the system call ID to index the table, looking for a real hardware call function
- Making hardware-related calls
Open Activitymanager under Mac or run top in terminal to see the CPU usage of the user and the system
User and Kernel CPU usage
4. POSIX threading Model
POSIX is the thread standard in IEEE P1003.1, and currently all systems, even windows, support POSIX. It provides a thread programming interface under the user's condition, and the developer can only invoke the reference header file when developing the thread pthread.h
. The program runs through system calls and implements threads in the kernel. It has a lot of functions, such as Create, exit, join, yield, etc., specific to the various platforms under the LIBC source code/SDK to see the header file method definition, For example, the Pthread.h code in Android using BIOLIBC is here, where the header file is the wrapper on the kernel thread.
The difference between a thread and a process
The following refers specifically to the POSIX model in the 32-bit Linux system using GLIBC, which is the user face model
This test is based on Ubuntu 14.04 i386
1. Test code Design
1.1. Thread Test Code
Modified from HTTPS://COMPUTING.LLNL.GOV/TUTORIALS/PTHREADS/SAMPLES/HELLO.CTodo run:Clang-wall-g Pthread.c-o Pthread.out-lpthreadStrace-cfo./pthread.strace.log./pthread.out#Include<stdio.h>#Include<stdlib.h>#Include<pthread.h>void*Printhello(void *threadid) {Long Tid; Tid = (Long) ThreadID;printf ( "Hello world! It's Me, Thread #%ld!\n ", TID); Pthread_exit (null);} int main (int argc, char *argv[]) {pthread_t thread; int rc = 0; long t = 0; printf ( "in main:creating thread%ld\n", t); rc = Pthread_create (&thread, null, Printhello, (void *) t); if (RC) {exit (-1);}}
1.2. Process Test Code
//todo run://clang -Wall -g fork.c -o fork.out//strace -Cfo ./fork.strace.log ./fork.out#include <unistd.h>int main(int argc, char *argv[]){ pid_t pid; pid = fork(); if(pid < 0){ return -1; } return 0;}
2. Test results
strace
after invoking the command, the result is as follows
2.1. The strace route of the process is as follows
19948 Execve ("./fork.out", ["./fork.out"], [/* (VARs */]) = 019948 brk (0) = 0x9bc00019948 Open ("/lib/x86_64-linux-gnu/ Libc.so.6 ", o_rdonly| o_cloexec) = 319948 Read (3, "\177elf\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\320\37\2\0\0\0\0\0" ..., 832) = 832..... 19948 Clone (Child_stack=0, flags=clone_child_cleartid| clone_child_settid| SIGCHLD, Child_tidptr=0x7f5adac4ca10) = 19949....19949 + + + exited with 0 + + +
2.2. The thread's strace route is as follows
21958 Execve ("./pthread.out", ["./pthread.out"], [/* VARs */]) =021958 Open ("/lib/x86_64-linux-gnu/libpthread.so.0", o_rdonly| O_CLOEXEC) =3 .....21958 Access ("/etc/ld.so.nohwcap", F_OK) =-1 ENOENT (No such file or directory)21958 Open ("/lib/x86_64-linux-gnu/libc.so.6", o_rdonly| O_CLOEXEC) =321958 Read (3,"\177elf\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\320\37\2\0\0\0\0\0" ...,832) =83221958 Fstat (3, {st_mode=s_ifreg|0755, St_size=1845024, ...}) = 021958 mmap (NULL, 3953344, prot_read| Prot_exec, map_private| Map_denywrite, 3, 0) = 0x7f34229e4000 .... 21958 Clone (Child_stack=0x7f34229e2fb0, flags=CLONE_VM| clone_fs| clone_files| clone_sighand| clone_thread| clone_sysvsem| clone_settls| clone_parent_settid| Clone_child_cleartid, Parent_tidptr=0x7f34229e39d0, Tls= 0x7f34229e3700, Child_tidptr=0x7f34229e39d0) = 21959....21958 + + + exited with 0 + + +
3. Test conclusion
Through the above call stack analysis, we can know that all are through the call x86_64-linux-gnu
of the LIBC library, and then through the Systemcall function clone()
to achieve the control of the kernel process, the main difference in the function parameters of the flag is different, clone_ Flag specifies a resource that can be shared
//clone flag between thread and process//??: 省略了`CLONE_`前缀//进程的FLAG参数flags=CHILD_CLEARTID|CHILD_SETTID|SIGCHLD//线程的FLAG参数flags=VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID
By clone
making a man query,
Parameter interpretation of the process:
CLONE_CHILD_CLEARTID
: Erase child thread ID at location Ctid in child memory when the child exits, and does a wakeup on the Futex at that Addres S
CLONE_SETTLS
: Thread local Storage (TLS) area, note that this is not portable
CLONE_SIGHAND
: Shared signal handlers
Some parameters of the thread are explained:
CLONE_VM
: The calling process and the child process run in the same memory space. (Note that this memory space
refers to the memory allocated through MMAP ().) To say a little more, the stack memory in the thread is pthread_attr_t
implemented by the function in the property pthread_attr_setstacksize
, the default may be 8MB, of course, in practice we use the stack memory is mostly a few kb; heap memory is shared, not discussed here)
CLONE_FS
: Shared file system, the following functions Chroot (2), ChDir (2), or Umask (2) will be affected.
CLONE_FILES
: Share File descriptor Table
CLONE_SIGHAND
: Shared signal handlers
CLONE_THREAD
: Shared thread Group, which has the same PID, independent Tid;
CLONE_SYSVSEM
: Shared system V semaphore undo Values List, I mean, I don't know yet.
CLONE_SETTLS
: Thread local Storage (TLS) area, note that this is not portable
CLONE_PARENT_SETTID
: Store child thread ID at location Ptid in parent and child memory.
CLONE_CHILD_CLEARTID
: Erase child thread ID at location Ctid in child memory when the child exits, and does a wakeup on the Futex at that Addres S
Then combined with some textbooks, you can learn
|
Process |
Threads |
User Layer Functions |
Fork () |
Pthread_create () |
Kernel implementation |
Clone () |
Clone () |
Memory |
Newly copied memory (Copy-on-write), standalone 4G (1g+3g) |
Shared 4G Memory: a stack of about 8 m of memory is private and can be determined by parameters; shared heap Memory |
Create Time-consuming |
Fewer flag copies, so it takes a lot of time |
Low |
Context switching time consuming |
Switching the memory address |
Almost only the loss of access to the kernel |
Internal communication |
Ipc |
Shared memory Area (simpler) |
High-level language encapsulation implementation of kernel threads
In addition to the POSIX standard, the high-level language can itself through the system calls to the kernel thread implementation, mainly the following three kinds.
1. Pure kernel thread implementation (1:1)
This threading model corresponds the kernel thread to app thread one by one and can be seen as a simple mapping, represented by the POSIX threading model (pthread), and the Java and Ruby (1.9+) threading model that relies on the Pthread standard library.
To create a thread under ANDROID/ARTJVM, for example, implement the call stack as follows
java.lang.Thread |POSIX thread(user mode){ 0. art.runtime.Thread::CreateNativeThread(cpp, in jvm) 1. pthread_create(pthread.h,标准库头文件) 2. bionic标准库下的so文件,进行SystemCall(libc) 3. 用户态陷入内核态} |Kernal thread(kernal mode)
As can be seen, the implementation of the JVM under the main is the POSIX thread packaging and mapping, itself just do a little bit of work, features are as follows:
- Poor portability requires a variety of libc libraries, but because they are managed directly by the OS, the efficient dispatch of the kernel can be fully borrowed from the allocation task, enabling efficient use of physical cores and real parallelism.
- User-State and kernel-state switching have a certain loss of consumption
2. Pure user-State implementation (1:N)
The implementation of the thread scheduling in the user state, also known green thread
as its own write scheduling algorithm, you can map a native thread to multiple app thread (here also called line threads), here is the representative of Ruby (1.8-), Java and other old versions, features are as follows:
- Good portability, no switchover, no loss of mapping to the kernel
- Need to maintain their own scheduler
- Multi-core utilization is difficult because the kernel does not understand scheduling details
3. Hybrid implementations (M:N)
You can manage n app threads, such as Golang, at the same time by running m kernel threads. By setting up GOMAXPROCS
a native thread and then go
creating an app thread from the keyword, it features the following:
- Scheduler implementation is more difficult
- Simplifies concurrent programming with syntax sugar and piping, low switching loss
- Partial scheduling requires self-release time slices
golang threading model(N) ↓ ↓ goroutine ↓Kernal thread model(M)
See Libtask and Xu Xiwei's "Go Language Programming"
Summarize
- Concurrent is performed simultaneously on multiple tasks, while parallels is a time-sharing slice
- After starting a program, the user-state and kernel-state tasks are assigned and the high-privilege tasks in the kernel are executed through system calls
- POSIX is a threading standard, or an interface, implemented by the LIBC library
- The biggest difference between a thread and a process is
clone
when the flag is different at the time of operation, causing the shared resource to be different. The final creation, switching time is different, and memory allocation, internal communication complexity is different.
- In Java, it
java.lang.Thread
corresponds to kernel thread one by one, in some older languages, a kernel thread is implemented for multiple high-level threads, and in Golang, by goroutine
implementing the M kernel thread corresponding to n high-level threads;
Ref
- https://www.zhihu.com/question/21461752
- https://blog.codinghorror.com/understanding-user-and-kernel-mode/
- Http://stackoverflow.com/questions/1311402/differences-between-user-and-kernel-modes
- Https://zh.wikipedia.org/wiki/%E5%BF%99%E7%A2%8C%E7%AD%89%E5%BE%85
- https://www.ibm.com/developerworks/cn/linux/l-system-calls/
A review of threading models