A complex explanation of processes and threads

Last Update:2016-10-15 Source: Internet

Author: User

Tags switches

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

About the process and the thread, the interview was asked twice did not answer. After the first ask a little bit of the concept, and did not fully understand so also did not remember, so the second time was asked when also did not answer O (╯-╰) O.

So this time, we'll solve the problem completely.

The title is a simple explanation of "process and thread" imitating Ruan Yi Feng, see http://www.ruanyifeng.com/blog/2013/04/processes_and_threads.html, the metaphor is very image.

Defined

processes ( English: Process) are entities on your computer that have run programs. The process is the basic operating unit of the time-sharing system. In a process-oriented system (such as earlier versions of Unix,linux 2.4 and earlier), a process is the basic execution entity of a program, and in a thread-oriented system such as most modern operating systems, Linux 2.6, and newer versions, the process itself is not a basic operating unit, but a container for threads. The program itself is simply a description of the instruction, the data and its organization, and the process is the real running instance of the program (those instructions and data). Several processes may be related to the same program, and each process can run independently (sequentially) or asynchronously (parallel). modern computer systems can load multiple programs into memory in the same period of time in a process, and are shared by time (or called Division Multiplexing) to show the sensation of simultaneous (parallelism) running on a single processor. Similarly, the use of multithreading technology (multithreading means that each line routine represents a separate execution context within a process) of the operating system or computer architecture, the same program of parallel threads, can be on a multi-CPU host or network to actually run concurrently (on different CPUs).

Threads ( English: Thread) are the smallest unit in which the operating system is able to perform operational scheduling. It is included in the process and is the actual operating unit of the process. A thread refers to a single sequential control flow in a process in which multiple threads can be concurrent and each thread performs different tasks in parallel. In Unix System V and SunOS are also known as lightweight processes (lightweight processes), but lightweight processes refer to kernel threads (kernel thread) more, while the user thread is called a thread.

Threads are the basic unit of independent Dispatch and dispatch. A thread can be a kernel thread that is scheduled by the operating system kernel, such as a Win32 thread, a user thread that is scheduled by a user process, such as a POSIX thread for a Linux platform, or a hybrid dispatch by a kernel with a user process, such as a Windows 7 thread.

Multiple threads in the same process share all of the system resources in the process, such as virtual address space, file descriptors, signal processing, and so on. But multiple threads in the same process have their own calling stack (call stack), their own register environment (register context), their own thread-local storage (thread-local storage).

A process can have many threads, and each thread performs different tasks in parallel.

The benefits of using multithreaded programming on multicore or multi-CPU, or on CPUs that support hyper-threading, are obvious, which increases the execution throughput of the program. On a single CPU single-core computer, the use of multithreading technology, can also be responsible for IO processing in the process, human-computer interaction and often blocked part of the dense computing part of the execution, write a dedicated workhorse thread to perform dense computation, thus improving the execution efficiency of the program.

The above is from Wikipedia.

Understand

On the macro level, the process is understood from the link between the program and the process: in a sense, a process is a process that an application performs on a processing machine, and it is a dynamic concept. A process is a run-time activity of a program with independent functionality about a data collection. It can apply and own system resources, is a dynamic concept, is an active entity. It is not just the code of the program, it also includes the current activity, which is represented by the value of the program counter and the contents of the processing register. a process is an "executing program". A program is an inanimate entity that can become an active entity when the processor is given the program's life, which we call a process.

Then, the thread is understood from the relationship of the process and the thread: The thread is part of the process, and the process contains multiple threads running. Typically, you can include several threads in a process that can take advantage of the resources owned by the process. In the operating system in which threading is introduced, the process is usually used as the basic unit for allocating resources, while threads are used as the basic unit of independent operation and independent Dispatch.

A modern computer system can load multiple programs into memory in the same period of time in a process, and by time share (or call division Multiplexing) to show the sensation of simultaneous (parallelism) running on a single processor. Similarly, the use of multithreading technology (multithreading means that each line routine represents an independent execution context within a process) of the operating system or computer architecture, the same program of parallel threads, can be on a multi-CPU host or network really simultaneously running (on different CPUs) , Not all operating systems are thread-wired. As a result, this involves the cause of the thread's appearance.

Look at the answer from the question, "What's the use of multithreading"?

Pansz Link: https://www.zhihu.com/question/19901763/answer/13299543 Source: The copyright belongs to the author, reprint please contact the author to obtain authorization. So explain the question:
1. One-process single-threaded: A person eats food on a table.
2. Single-process Multithreading: Many people eat food together on the same table.
3. Multi-process Single-threaded: More than one person eats vegetables on their own table.

Multi-threaded problem is that many people at the same time eat a dish when prone to scramble, for example, two person at the same time clip a dish, a person just out of chopsticks, the results reached the time has been clip away vegetables ... At this point, you have to wait for a person to take a bite, in return to another person to the dish, that is, the sharing of resources will conflict.

1. For Windows systems, the "open table" is expensive, so Windows encourages everyone to eat on a table. Therefore, the focus of Windows multithreaded learning is to face the problem of resource scrambling and synchronization.

2. For Linux systems, the "open table" is very inexpensive, so Linux encourages everyone to open their own table for food. This brings a new problem: sitting on two different desks, it's inconvenient to talk. Therefore, the learning focus of Linux is to learn the methods of inter-process communication.

--
Add: Someone is interested in the cost of opening the table. I'll put the question to the point of extension.

Opening a table means creating a process. The overhead here mainly refers to the time overhead.
You can do an experiment: Create a process, write some data to memory in the process, read the data, and then exit. This process repeats 1000 times, which is equivalent to creating/destroying the process 1000 times. The test results on my machine are:
Ubuntulinux: 0.8 seconds
Windows7:79.8 seconds
The cost of the two is about 100 times times the difference.

This means that in Windows, the overhead of process creation cannot be overlooked. In other words, it is not recommended that you create a process in Windows programming, and if your program architecture requires a lot of process creation, it is best to switch to a Linux system.

A typical example of a large number of creation processes is two, one of which is the GNU Autotools Toolchain, which compiles a lot of open source code that is slow to compile under windows, so it's best for software developers to avoid windows. The other is the server, some server frameworks rely on a lot of creation process to work, even for each user request to create a process, these servers run under Windows is inefficient. This "possible" is also the reason why Linux servers are much larger than Windows servers worldwide.

--
Again: If you are writing server-side applications, in fact, in the current network service model, the cost of opening the table is negligible, because it is generally popular in accordance with the CPU core number of open processes or threads, after the end of the number has been maintained, The process and thread internally use either a coprocessor or asynchronous communication to handle multiple concurrent connections, so the overhead of open processes and threads can be ignored.

Another new kind of overhead is put on the agenda: core switching overhead.

Modern systems, the general CPU will have multiple cores, and multiple cores can run multiple different threads or processes at the same time.

When each CPU core runs a process, there is no need to consider the context when switching between CPU cores because each process's resources are independent.

When each CPU core runs a thread, because each thread needs to share resources, the resources must be copied from one core of the CPU to the other to continue the operation, which consumes additional overhead. In other words, in the case of multi-core CPUs, multithreading is less performance than multi-process.

Therefore, in the current server-side programming for multicore, it is necessary to be accustomed to multi-process rather than multithreading.

Analysis of the above answer I learned two points: 1. In Windows, the creation of processes is expensive, so Windows encourages everyone to eat on a table. Therefore, the focus of Windows multithreaded learning is to face the problem of resource scrambling and synchronization. 2. For a Linux system, the process creation overhead is minimal, so Linux encourages everyone to open their own table for food. This brings a new problem: sitting on two different desks, it's inconvenient to talk. Therefore, the learning focus of Linux is to learn the methods of inter-process communication. （...... is to answer the two points that the Lord has just begun to summarize. ）

The relationship between the process and the thread is basically clear.

Difference

The next thing you need to answer is the difference. I couldn't remember the way I had memorized it before. Only by understanding the principle, can you easily remember. So keep looking.

The difference between threads and processes is summarized:

A. Address space and other resources : processes are independent of each other and are shared among threads of the same process. Threads within a process are not visible in other processes.

B. Communication: Inter-process communication IPC, between threads can directly read and write process data segments (such as global variables) to communicate--requires process synchronization and mutual exclusion means of support to ensure data consistency.

c. Scheduling and switching : Thread context switches are much faster than process context switches.

D. In a multithreaded OS, a process is not an executable entity.

Understanding of the above summary. A, address space and other resources, a "other resource" to make the problem unclear, what are the other resources? _?orz the theory of eating with that table, the dishes on the table can be thought of as the address space and other resources here, but what the entity is, I don't know yet.

And I found this.

From http://laiyuanyuan7.blog.163.com/blog/static/15274321201241191321666/

Thread-shared environments include the process code snippet, the public data of the process (which leverages the shared data, the threads are easily communicating with each other), the file descriptor that the process opens, the processor of the signal, the current directory of the process, and the process user ID and process group ID.

Threads have a lot in common and have their own personalities. With these personalities, threads can achieve concurrency. These personalities include:
1. Thread ID
Each thread has its own thread ID, which is unique in this process. The process uses this to identify the thread. 2. Value of the Register group
Because the threads are running concurrently, each thread has its own different running threads, and when switching from one thread to another, the state of the Register collection of the original thread must be saved so that the thread can be restored in the future when it is re-switched to. 3. The stack of threads
The stack is necessary to ensure that the thread runs independently.
A thread function can call a function, and the called function can be nested in layers, so the thread must have its own stack of functions so that the function call can execute normally, not affected by other threads.
4. Error return code
Since many threads in the same process are running concurrently, it is possible for a thread to set the errno value after a system call, while the thread has not handled the error and another thread is running at this point by the scheduler, so that the error value can be modified.
Therefore, different threads should have their own error return code variable.
5. Thread's Signal Shield code
Because each thread is interested in a different signal, the thread's signal masking code should be managed by the thread itself. But all the threads share the same signal processor.
6. Priority of Threads
Because the thread needs to be dispatched like a process, there must be a parameter available for dispatch, which is the thread's priority.
When it comes to multi-threaded routines, there are often things that are difficult to Incredibles, and allocating a variable with heaps and stacks can produce unexpected results in subsequent executions, and the result is that the memory is being accessed illegally, causing the contents of the memory to be changed.

The two basic concepts for understanding this phenomenon are that the threads in a process share the heap, while threads in the process maintain their own stacks.

Another mechanism is to declare a member variable such as Char name[200], with the end of this code call, the address of name on the stack is freed, and if it is char * Name = new char[200]; The situation is completely different, unless a call to delete is displayed and the address pointed to by name is not freed.

In B, if you allocate the declaration V and the heap area using the mechanism of the temporary variable in the stack area, the result is different. If you use the stack area, if the variable address is am1-am2 so large, exit B calls when the address is released, the C function may overwrite the memory, so that when D executes, the content read from the memory am1-am2 is changed.

And if you allocate with new (heap), that is not the case, because there is no display of the pair with delete and the heap is shared with the thread, that is, 2 threads can see what the 1 threads are allocating in the heap, so there will be no false write.

This is the problem I found in the company internship, because at that time just involved in multithreaded programming, the operating system so simple topic plagued the author for a long time, I hope that the first C + + multi-threaded readers can help! 2) If two threads share the heap and all of them are likely to perform memory allocation and release operations, they must be protected synchronously, which is not related to Class C, class R, T. You see an example of two threads that should be using the respective heap.

On platforms such as Windows, different threads use the same heap by default, so when you allocate memory with malloc (or Windows GlobalAlloc) in C, you use synchronous protection. Without synchronous protection, a race condition occurs when two threads perform simultaneous memory operations, which can lead to confusion in memory management within the heap. For example, two threads are allocated a unified block memory address, an idle list pointer error, and so on.

Symbian threads generally use a separate heap space. This allows each thread to be allocated and freed directly in its own heap, reducing the overhead that synchronization introduces. When the thread exits, the system reclaims the thread's heap space directly, and the memory space within the thread does not cause a memory leak within the process.

However, when two threads use a common heap, they must be protected synchronously with the critical section or mutex. Otherwise, the program crashes sooner or later. If your thread needs to allocate and release any number and type of objects on the shared heap without rules, you can customize a allcator to use synchronous protection within allocator. The thread uses this allocator to allocate memory directly. This is equivalent to implementing your own malloc,free. However, it is recommended that you review your system again, because most of this is unnecessary. With good design, the local heap of threads should be able to meet the needs of most objects. If you have a class of objects that need to be created and shared on a shared heap, this is a reasonable requirement, and you can implement shared protection on new and delete on this class.

Well, so, what is the process of sharing between threads and being independent of the processes? Is the address space, the process code snippet, the data segment (global variable), the file descriptor that the process opens, the processor of the signal, the current directory of the process, the process user ID, the process group ID, and so on.

Then it's easy to understand B, where communication between threads is only the data segment of the process, so it involves issues of synchronization and mutual exclusion (producer-consumer issues, etc.). Interprocess communication is the IPC (inter-process communication), which is a way of communicating among a lot of processes in Linux.

Understand c, thread scheduling and switching is much faster than the process, this is obviously, this is the cause of the thread.

Understand d, I think do not care ah, this sentence means that in the multi-threaded operating system, the thread is the operating system scheduling and allocation of the basic unit.

Well, here's a better explanation.

From Cloud Wind http://www.cnblogs.com/way_testlife/archive/2011/04/16/2018312.html

The process is executed in a linear way, although there is an interruption or pause in the middle, but the resources owned by the process serve only the linear execution of the process. Once a process context switch occurs, these resources are protected. This is the process of macro-implementation process. And the process can have single-threaded process and multi-threaded process two kinds. We know that the process has a Process control block PCB, the relevant program segment and the program segment to manipulate the data structure set these three parts, the implementation of single-threaded process is linear in the macroscopic, there is only a single execution process on the micro, and the process of multithreading in the macro implementation of the same linear, However, there can be multiple execution operations (threads) on the micro, such as different code snippets and related data structure sets. the change of thread only represents a change in the CPU execution process, without the resource changes that the process has. out of the CPU, the allocation of hardware and software resources within the computer is not thread-independent, and the thread can only share the resources of the process it belongs to. Similar to the Process Control table and the PCB, each thread also has its own thread control table TCB, and this TCB holds much less thread state information than the PCB table, which is primarily related to pointers with stack (system stack and user stack) and state data in registers. a process has a complete virtual address space that is independent of the thread, and vice versa, a thread is part of a process that does not have its own address space and shares all the resources assigned to the process with other threads in the process .

Threads can effectively improve the execution efficiency of the system, but are not applicable in all computer systems, such as certain real-time systems that rarely do process scheduling and switching. The advantage of using threads is that there are multiple tasks that require processor processing to reduce the switching time of the processor, and that the overhead required to create and end a thread is much smaller than the process is created and ended. The most appropriate system to use threading is multiprocessor system and network systems or distributed systems.

----------------------------------

1. The execution characteristics of the thread.

A thread has only 3 basic states: Ready, execute, block.

Threads have 5 basic operations to toggle the state of a thread: derive, block, activate, dispatch, end.

2. Process communication.

There are 4 forms of process communication in a stand-alone system: Master-Slave, conversational, message or mailbox mechanism, shared storage mode.

Master-Slave Typical example: Terminal control process and terminal process.

Conversational Typical example: communication between a user process and a disk management process.

----------------------------------

Well. That's almost all.

Question point?

Some people say that is why the CPU is running very fast, in a very short period of time between the different tasks, switching between the process overhead and switching between threads small, so the use of multithreading. That is, to make the most of the same CPU.

Others say it is to make full use of multiple CPUs.

Which one is it? Well, the question is, is the thread switching on a single CPU, or is it possible to switch if there are multiple CPUs?

Now it's time to take the test. If I were asked again about the difference between the process and the thread, what would I say?

1. Process is the running state of the program, is dynamic.

2. A process has context, the context overhead of creating processes and switching processes is large, so there are threads.

3. So, in order to make full use of the CPU, there is a thread.

4. Content shared between threads includes process code snippets, shared data segments (global variables, etc.), process open file descriptors, signal processors, the current directory of processes and process user IDs and process group IDs ...

A complex explanation of processes and threads

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More