connections and differences between processes and threads

Last Update:2017-12-05 Source: Internet

Author: User

Tags call back

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Definition

A process is a program with a certain independent function about a single run activity on a data set , a process that is an independent unit of the system's resource allocation and scheduling.

A thread is an entity of a process that is the basic unit of CPU dispatch and dispatch, which is a smaller unit that can run independently than a process. The thread itself basically does not own the system resources, only has a point in the operation of the necessary resources (such as program counters, a set of registers and stacks), However, it can share all of the resources owned by the process with other threads that belong to one process.

2. Relationship

One thread can create and revoke another thread , and can execute concurrently between multiple threads in the same process.

Relative to a process, a thread is a concept that is closer to the execution body, it can share data with other threads in the process, but has its own stack space and has a separate execution sequence.

3. Differences

The main difference between processes and threads is that they are different ways to manage operating system resources. The process has a separate address space, and after a process crashes, it does not affect other processes in protected mode, and the thread is just a different execution path in a process. Thread has its own stack and local variables, but there is no separate address space between the threads, a thread dead is equal to the entire process dead, so the multi-process program is more robust than multithreaded programs, but in the process of switching, the cost of large resources, efficiency is worse. but for some concurrent operations that require simultaneous and shared variables, only threads can be used, and processes cannot be used.

1) In short, a program has at least one process, and a process has at least one thread.

2) The thread partition scale is smaller than the process, which makes the multi-thread procedure high concurrency.

3) In addition, the process has a separate memory unit during execution, while multiple threads share memory, which greatly improves the efficiency of the program operation.

4) threads are still different from the process in the execution process. Each separate thread has a program run entry, sequence of sequence execution, and exit of the program. However, threads cannot be executed independently, and must be dependent on the application, which provides multiple threads of execution control.

5) from a logical point of view, the meaning of multithreading is that in an application, there are multiple execution parts that can be executed concurrently. However, the operating system does not consider multiple threads as separate applications to implement scheduling and management of processes and resource allocation. This is the important difference between processes and threads.

4. Pros and cons

Threads and processes have advantages and disadvantages in use: the overhead of thread execution is small but not conducive to the management and protection of resources, while the process is the opposite. At the same time, threads are suitable for running on SMP machines, while processes can be migrated across machines.

Process Concepts

Process is the basic unit of resource allocation and the basic unit of dispatching operation. For example, when a user runs his or her own program, the system creates a process and assigns IT resources, including tables, memory space, disk space, I/O devices, and so on. Then, put the process in the ready queue of the human process. The process Scheduler selects it, allocates CPU and other related resources to it, and the process actually runs. Therefore, the process is the unit of concurrent execution in the system.

In a microkernel-based operating system such as Mac, Windows NT, the functionality of the process has changed: it is only the unit of the resource allocation, not the unit of the dispatch run. In a microkernel system, the basic unit of a real dispatch operation is a thread. Therefore, the unit that implements the concurrency feature is the thread.

Threading Concepts

A thread is the smallest unit of execution in a process, that is, the basic unit of the execution processor dispatch. If the process is understood as a task that is done logically by the operating system, then the thread represents one of many possible subtasks that complete the task. For example, if a user launches a database application in a window, the operating system represents a call to the database as a process. Suppose a user is going to generate a payroll report from a database and upload it to a file, which is a subtask; in the process of generating payroll reports, the user can also lose the database query request, which is a sub-task. In this way, the operating system represents each request-payroll report and the new sender's data query as a separate thread in the database process. Threads can be scheduled to execute independently on the processor, allowing several threads to be on separate processors in a multiprocessor environment. The operating system provides threads for the convenience and effectiveness of this concurrency

Benefits of introducing threading

(1) Easy to dispatch.

(2) Improve concurrency. Concurrency can be easily and efficiently achieved through threading. A process can create multiple threads to perform different parts of the same program.

(3) less overhead. Creating a line turndown creates a process that is fast and requires little overhead.

(4) Facilitate the full use of multi-processor functions. By creating a multithreaded process (that is, a process can have two or more threads), each thread runs on a single processor, enabling the concurrency of the application so that each processor is fully operational.

Relationship of processes and threads

(1) A thread can belong to only one process, while a process may have multiple threads, but at least one thread. A thread is the smallest execution and dispatch unit that the operating system can recognize.

(2) A resource is allocated to a process, and all the threads of the same process share all the resources of that process. Multiple threads in the same process share code snippets (code and constants), data segments (global variables and static variables), extension segments (heap storage). However, each thread has its own stack segment, which is also called the runtime window, which is used to hold all local variables and temporary variables.

(3) The processor is assigned to a thread, that is, a thread that is actually running on the processing machine.

(4) Threads need to be synchronized during execution. Synchronization is achieved between threads of different processes using the means of message communication.

Processor management is one of the basic management functions of the operating system, and it is concerned with the problem of processor allocation. In other words, the use of CPU (central processor) to a program, usually the program is ready to enter the memory called the job, when the job into memory, we call it a process.

Since the 60 's, the process concept has been presented in the operating system as a basic unit capable of running independently. It was not until the middle of the 80 that people raised the basic unit-thread, which could run independently than the process, and tried to use it to increase the speed of concurrent execution of programs in the system, which could further improve the throughput of the system. In recent years, the concept of threading has been widely used, not only in the new operating system, most of the threading concept has been introduced, but also in the new database management system and other application software, has introduced a thread to improve the performance of the system.

If the purpose of introducing a process in the operating system is to enable multiple programs to execute concurrently to improve resource utilization and improve the throughput of the system, then the introduction of threads in the operating system is intended to reduce the time and space overhead incurred by program concurrency and to make the operating system more concurrency-efficient. To illustrate this, we first review the two basic properties of the process:

(1) A process is an independent unit that can have resources;

(2) The process is also a basic unit that can be dispatched and distributed independently. It is because the process has these two basic attributes that it becomes a basic unit that can run independently and thus forms the basis for concurrent execution of the process.

However, in order for the program to execute concurrently, the system must also perform the following series of actions:

(1) Create a process. When the system creates a process, it must allocate all resources that it requires, except the processor. such as memory space, i/0 equipment and the establishment of the corresponding PCB.

(2) Undo process. When you revoke a process, the system must recycle the resources before undoing the PCB.

(3) Process switching. When switching processes, it takes a lot of processor time to keep the current process's CPU environment and set the CPU environment for the newly selected process.

In short, because the process is a resource owner, the system must pay a large space-time overhead for the process's creation, revocation, and switchover. Because of this, the number of processes set in the system should not be too much, the frequency of process switching should not be too high, but this also limits the degree of concurrency to further improve.

How to make more concurrent execution of multiple programs, while minimizing the overhead of the system, has become an important goal in designing the operating system in recent years. So, a lot of operating system scholars think, can be the process of the above attributes, separated by the operating system for processing. That is, as the basic unit of dispatch and dispatch, not as a unit of independent allocation of resources, so as to make it run lightly, and for the basic unit of resource, it is not frequently switched. It is under the guidance of this idea that a threading concept is produced.

In an operating system that introduces threads, a thread is an entity in a process that is the basic unit of Dispatch and dispatch independently by the system. The thread itself does not own the system resources, it has only a few resources (such as program counters, a set of registers and stacks) that are essential in the run, but it can share all the resources owned by the process with other threads of the same process. One thread can create and revoke another thread, and can execute concurrently between multiple threads in the same process. Because of the mutual restriction between threads, the thread is also intermittent in running. Accordingly, the thread is also ready, blocking, and executing three basic states, some of which have a terminating state.

Thread-to-process comparison

Threads have the characteristics of many traditional processes and are also referred to as lightweight processes (Light-weight process) or process elements, whereas traditional processes are called heavy-duty processes (Heavy-weight process), which is equivalent to a task with only one thread. In the operating system in which threading is introduced, typically a process has several threads and at least one thread is required. Below, we compare threads and processes in terms of scheduling, concurrency, overhead, owning resources, and so on.

1. Scheduling

In the traditional operating system, the basic unit with resources and the basic unit of dispatch and dispatch are all processes. In the operating system in which threading is introduced, threads are used as the basic unit of dispatch and dispatch. And the process as the basic unit of resources, so that the traditional process of the two attributes, the thread can run light, so that the system can significantly improve the degree of concurrency. In the same process, the switch of the thread does not cause a switchover of the process, which causes the process to switch when a thread in one process switches to a thread in another process.

2. Concurrency of

In the operating system that introduces the thread, not only can the process be executed concurrently, but also execute concurrently between multiple threads in a process, so that the operating system has better concurrency, which makes it more efficient to use system resources and improve system throughput. For example, in a single-CPU operating system that does not introduce threads, if only one file service process is set up, when it is blocked for some reason, there is no other file services process to provide the service. In an operating system that introduces threads, you can set up multiple service threads in a single file service process, and the second thread in the file services process can continue to run when the first thread waits, and when the second thread blocks, the third thread can continue to perform, significantly improving the quality of the file service and the system throughput.

3. Own resources

Whether it is a traditional operating system or a threading operating system, a process is a separate unit of resources that can have its own resources. Generally, the thread itself does not own system resources (there is also a bit of an essential resource), but it can access the resources of its subordinate processes. That is, a process's code snippets, data segments, and system resources, such as open files, I/O devices, etc., can be shared by all other threads of a process.

4. System overhead

Because the system allocates or reclaims resources, such as memory space, I/O devices, and so on, when a process is created or revoked. As a result, the operating system will pay significantly more than the cost of creating or revoking a thread. Similarly, when the process is switched on, it involves the storage of the entire current process CPU environment and the setting of the CPU environment of the newly scheduled running process. Thread switching only needs to save and set the contents of a few registers, and does not involve the operation of memory management. It can be seen that the cost of process switching is much greater than the overhead of thread switching. In addition, because multiple threads in the same process have the same address space, the synchronization between them and the implementation of communication becomes easier. In some systems, thread switching, synchronization, and communication are not required

Fork () produces a child process that is exactly the same as the parent process, but the child process will then exec the system call, and for efficiency reasons, Linux introduces a "copy-on-write" technique that copies the contents of the parent process to a child process only if the contents of each segment of the process space are to change. before exec after the fork two processes use the same physical space (memory area), the child process of the code snippet, data segment, stack are points to the parent process of the physical space, that is, the two virtual space is different, but its corresponding physical space is the same. When the parent-child process changes the corresponding segment of the behavior occurs, and then the corresponding segments of the child process to allocate physical space, if not because of the exec, the kernel will give the child process data segment, stack segment allocation of the corresponding physical space (so that both have their own process space, non-impact), The code snippet continues to share the physical space of the parent process (the code is exactly the same). And if it is because of exec, the code snippet for the child process will also be assigned a separate physical space because of the different code executed by the two.
After fork, the kernel passes the child process to the front of the queue so that the child process executes first, so that the parent process does not cause a write-time copy, and the child process executes the exec system call, resulting in a loss of efficiency due to meaningless replication.

At fork, the child process obtains the copy of the parent process data space, heap, and stack, so the address of the variable (of course, the virtual address) is the same.

Each process has its own virtual address space, and the same virtual address for different processes can obviously correspond to a different physical address. So the same address (virtual address) and the value of the difference is no surprise.
The exact process is this:
The fork child process completely replicates the stack space of the parent process, but also copies the page table, but does not copy the physical page, so the virtual address is the same, the physical address is the same, but the parent-child shared page is marked as "read-only" (like the private Way of Mmap), If the parent-child process has been on the same page for this page, knowing that any one of the processes wants to "write" to the shared page, the kernel copies a physical page to the process and modifies the page table. Instead, the original read-only page is marked as "writable" and left for another process to use.

This is called "copy-on-write". Because Fork uses the mechanism of this write-time replication, so fork out the child process, the parent-child process which first scheduled it? The kernel typically dispatches child processes first, because in many cases the process is to execute exec immediately, emptying the stack, the heap, and so on. These and the parent process share the space that loads the new code snippet ... , which avoids the chance of copying a shared page on a "copy-on-write". If the parent process first dispatches the most likely to write a shared page, it produces "copy-on-write" without effort. Therefore, the child process is usually the first to dispatch drops.

Both the stack data and the global data are independent between the child process and the parent process, except that the code is shared.

PID = fork () returns two times, first parent process, second child process:

[Email protected] timetest]#./test
This is the parent process:4016
This is the child process:4017

Note: Here is not the absolute first to return to the parent process of the PID, the specific implementation of the process, to see the operating system process scheduling algorithm.

The interpretation of the previous execution returned two times ********************************************** ***************

The operating system creates a new process (child process), and establishes a new table entry for it in the process table accordingly. The new process and the executable procedure of the original process are the same program; The majority of the context and data is the copy of the original process (parent process), but they are two separate processes! At this time the program register PC, in the context of the parent, child process, claims that the process is currently executing to the fork call is about to return (at this time the child process does not occupy the CPU, the child process of the PC is not really saved in the register, but as the process context is saved in the process table in the corresponding table key). The question is how to go back and split up in the parent-child process.

The parent process continues execution, and the operating system implements the fork so that the call returns the PID (a positive integer) of the child process just created in the parent process, so the following if statement pid<0, the two branches of pid==0 will not execute. So output I am the parent process ...

The child process is dispatched at some later time, its context is swapped in, the CPU is occupied, and the operating system implements the fork, which causes the fork call in the child process to return 0. So in this process (note that this is not the parent process Oh, although it is the same program, but this is another execution of the same program, in the operating system this execution is represented by another process, from the perspective of execution and the parent process is independent of each other) pid=0. While this process continues to execute, Pid<0 is not satisfied in the IF statement, but Pid= =0 is true. So output I am the child process ...

Why does it seem that the two branches of the program that are mutually exclusive are executed? This is certainly not possible in a single execution of a program, but the two lines of output you see are from two processes, both of which are executed two times from the same program.

After fork, the operating system replicates a child process that is exactly the same as the parent process, although it is a parent-child relationship, but it seems to the operating system that they are more like brothers,These 2 processes share the code space, but the data space is independent of each other, the content in the child process data space is the full copy of the parent process, the instruction pointers are identical, but only a little different, if the fork succeeds, the return value of fork in the child process is 0, The return value of the fork in the parent process is the process number of the child process, and if the fork is unsuccessful, the parent process returns an error.

As you can imagine, 2 processes are running at the same time, and Unison, after the fork, they do different jobs, that is, bifurcation. This is the reason why fork is called fork.

After using fork () in the program section, the program forks and derives two processes. The specific which runs first looks at the system's scheduling algorithm.

If a parent-child process is needed, it can be resolved by means of the primitive language.

Why does the parent process create a child process? *************

As we said earlier, Linux is a multi-user operating system, and at the same time many users are competing for system resources. Sometimes processes create child processes to compete for resources in order to complete tasks earlier. Once a child process is created, the parent-child process continues to execute from the fork, competing for the resources of the system. Sometimes we want the child process to continue, and the parent process blocks until the child process finishes the task. At this point we can call wait or waitpid system call.

, the fork returns to 0 for the sub-process, but its PID is definitely not 0, and the fork returns 0 to it because it can call Getpid () to get its own PID at any time;

After fork the parent-child process is not sure who runs first, or who ends first, unless synchronization is used. It is not right to think that the child process is over stepfather process is returned from fork, which is not the case with fork, vfork.

Why return 0? **************************************************

First it must be clear that the return value of the function is stored in the register EAX.
Second, when the fork returns, the new process returns 0 because the EAX is set to 0 when the task structure is initialized;
In fork, the handle process is added to the operational queue, which is dispatched by the process scheduler at the appropriate time. That is, from this point on, the current process splits into two concurrent processes.
Regardless of which process is scheduled to run, the remaining code of the fork function will continue to execute, returning the respective values after execution.

The register after fork () specifically executes the *************************************

"NOTE5"
For fork, the parent-child process shares the same code space, so it feels like there are two returns, in fact, for the parent process calling fork, if the fork out of the child process is not dispatched, then the parent process from the fork system call back, while the analysis sys_fork know, The fork returns the ID of the child process. Look at the fork out of the sub-process, as can be seen by the copy_process function, the return address of the child process is ret_from_fork (and the parent process is returned on the same code point), the return value is directly set to 0. So when the child process is dispatched, it is also returned from fork, with a return value of 0.
Key note Two points: 1.fork returns the execution position of the parent or child process. (The value of the current process EAX is first returned as a return value) 2. Two times the position of the PID to be returned. (in EAX)

The process call copy_process gets the value of Lastpid (in eax, when fork returns normally, Lastpid is returned in the parent process)
Subprocess Task status segment the EAX of TSS is set to 0,
In FORK.C
p->tss.eax=0; (if the fruit process to be executed requires a process switch, when the switch occurs, the EAX value in the child process TSS is transferred into the EAX register, the child process executes the first EAX content as the return value)
When the child process starts executing, COPY_PROCESS returns the value of EAX.
After fork (), is two tasks simultaneously, the parent process uses his TSS, the subprocess with his own TSS, when switching, each with the value of the EAX.

So, "two times a call back" is 2 different processes!
Look at this sentence: Pid=fork ()
When this sentence is executed, the current process enters fork (), at which point the fork () is used for system invocation with an embedded assembly: int 0x80 (see the 133-line _syscall0 function for the kernel version 0.11 unistd.h file). Entering the kernel will then run the Sys_fork system call based on the system call function number that was written to EAX earlier. Then, Sys_fork first invokes the C function find_empty_process produces a new process, and then calls the C function copy_process copies the contents of the parent process to the child process. However, the EAX value in TSS in the subprocess is assigned a value of 0 (which is why the child process returns 0), and when the assignment is complete, copy_process returns the PID of the new process (the subprocess), which is saved to eax. At this point the child process is generated, the child process has the same code space as the parent process, the program pointer register EIP points to the same next instruction address, when Fork returns to its parent process normally, because the value in EAX is the newly created child process number, fork () returns the child process number, Execute else (pid>0); When a process switch runs a subprocess, first the runtime of the child process is restored and the TSS task status segment of the child process is loaded, where the EAX value (0 in copy_process) is also loaded into the EAX register, so When the child process runs, fork returns 0 to execute if (pid==0).

connections and differences between processes and threads

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More