Plain UNIX-process/threading model

Source: Internet
Author: User
Tags session id sessions unique id

The Unix tradition tends to delegate a task to a process, but within a task is not just a thread, like all members of a company, everyone is doing the same thing, everyone is only part of it, and when the grain size is reduced, everything can be done at the same time, anyway, Everyone also shares all the resources. So there is a thread. Threads are a different thread of shared resources. The semantics of threading are different from plain UNIX processes.

0. Original process model-the famous fork call

The naïve UNIX process relies on the famous fork call, which is that the fork call makes the UNIX process and the Windows process very different, and because of this fork call, there is no room for compatibility. The root of this fork call has a long history. As early as the UNIX large operating system, it existed, Unix just appeared in 1969, actually did not introduce the fork call, then there are two fixed processes connected to two terminals. When the fork call is introduced, the number of processes increases quickly, notice that there is no exec call at this time!
Before you understand the philosophy behind fork, look at what fork is. Fork is fork, from the same fork handle gradually fork, become a fork, also similar to the kind of daosh one, life two, two three, Sansheng everything. We see that with fork, in theory, countless processes can be generated, and they can all go back up to the same root! Why is UNIX using this model? The first thing to understand is what the process means when there is no "executable" concept.
Imagine how the program was originally entered into the computer. Today they rightfully exist on disk, as an "executable file" has been popular, but in the early 1950-1960, the program is the field input, through the original paper tape or carry a heavy tape, the file system has no concept, the entire tape, the content on the tapes is the computer to execute the program, After execution, want to execute another program, it is necessary to change the media ... People write a program, of course, to do something more than once, so if you can have more than one "process" simultaneously execute tape/tape program, the system throughput rate will be greatly improved, note that multiple processes are executing the same program! This is the simplest CTSS process model. Fork at Berkeley CTSS! Fork provides the means to replicate the current execution flow, and all the sub-processes that fork out can easily execute the same code.
This famous fork call has deeply influenced how people interpret CTSS! It was natural to introduce plain UNIX in the early 1970, saying that fork invocation is famous because it follows Unix (and Unix-like, Linux, for example) so far, directly affecting the UNIX process model. Now summarize why Unix uses fork to build the process. We know that from 0 to 1 is difficult, from 1 to 2 is relatively easy, also more difficult, from 2 to 3 ... It's easy. This is Daosh one, ... Sansheng everything! Unix in 1969 already has two processes, using fork can be super simple to achieve two three, sansheng everything, so, perhaps a coincidence, the previous Berkeley time-sharing system fork just right there, it was Thomas introduced Unix.
I would like to say why it is sansheng all things and not the two living things. Daosh One of these is the hardest, as we all know. 0 and 1 are two extremely special numbers, and 0 are more special. 2 is also more special, but 3 is very general, why 2 special? I do not want to use the game theory to describe, just to give an example, 2 people together, smell a fart smell, everyone will definitely be able to determine who put the, if it is me, then I must know, if I did not put, it must be the other side, of course, two people put the odds are also some. But 3 people together, in addition to the real fart of the person outside the 2 people can not judge this fart in the end who put it. This is the essential difference between the 3 and the 0,1,2. So Sansheng all things.

1.UNIX Process Model

At the beginning of Unix, the concept of the process was consistent with its prehistoric predecessors, when the file system was rather immature, and the programmer was concerned with performing a very hard-to-write task rather than writing the task itself (first there was not much demand, followed by information storage as a problem, no internet, You can compare today's AppStore ...). The fork call directly organizes the UNIX process into a tree, so:
the 1th swap/sched process and the 1th init process have a special status;
2. The model that formed the WHO Fork who wait and recycle, in the tree organization This is very important, easy to recover resources;
3. If the parent process exits first, all child processes are passed to init, which causes Init to exist and cannot be exited, in short, no process can be detached from the entire process tree.
In short, the simple UNIX process is the executable object of a node in a tree. Note that it is an executable object.
UNIX process model is built in the above basic principle, in addition, on the Periphery, Unix continuation of the Multics Project Shell idea, for each terminal open a shell. The shell is a second important feature of the UNIX system (if you don't say the file abstraction!). ), it needs to fork out the process exec out a new different execution stream. From the historical perspective of the above fork/exec, they were separated from the outset, which builds the complete UNIX process model:fork+exec
Let's look at what the UNIX process model can build. The early UNIX process was organized, and with the concept of Terminal, Unix gave the concept of process group and conversation.
A process group is a collection of associated sets of processes, such as individual commands for pipe-break connections. What's more, the association between them is explained by the user. A session is a collection of process groups, and the meaning of the session is that users can conveniently allow multiple process groups to share terminal access in some form. Because sitting in front of a terminal is a person, every time he performs an operation, this action to who is a problem. He can create a session, create multiple process groups within that session, and in his own way let the different groups of processes rotate into the foreground process group to manipulate it. The concept of session and process groups can be understood as an operator-controlled time-sharing system, except that the dispatcher is no longer the operating system and is the operator in front of the terminal. And each CPU can only have one process running similarly, each terminal session can only have one foreground process group at a time.
As we can see, the process organization of the UNIX process model naturally forms a hierarchical time-sharing scheduling hierarchy, at the bottom of which is the process, which is dispatched by the operating system kernel, then the process group, which collaborates to complete a task, organizes multiple processes, and schedules the operators to create the owning session. At the bottom of this hierarchical hierarchy, all processes are organized into a tree. This is the picture of the complete UNIX process model build. It is possible to build such a beautiful picture, fork+exec is the basic principle, between fork and exec, give the process more control of their own space, how to control their own group or session, by the process itself rather than the caller decide, the opposite example please look at Win32 The CreateProcess of the API. Now that the trouble has come, the thread has appeared, what should I do? If you want to know how Linux is making history, skip to the end.
I did not mention any UNIX version of the implementation of the above-mentioned build, because thought is far more important than implementation, but the implementation will drag you to build a new model. At the end of this article, I'll show how Linux reconciles the semantics of different process models, while proving the advanced nature of the UNIX process model.

2. Process model for providing a resource environment

Although Windows NT borrows a lot of ideas from UNIX, it uses a different approach to the process model. Windows NT was born in the 1990 's, the application has begun to bloom, the file system is very mature, the concept of executable continues from the MS-DOS era (in fact, the UNIXV6 version has the concept of executables, after the introduction of UNIX exec call, Executables are just backup resources of the process, and one can develop a large number of different programs based on the Win32 API, and then let them run separately, if you want a program to execute multiple times, click it several times.
In this era, as this article originally said, the granularity of execution is refined into the interior of a program. An application that needs to do a few different things to accomplish a task may need to do these things at the same time, like a co-ordinated approach in mathematics. Processes, in Winnt, can also be equivalent to a collection of named resources extracted from an executable file, which is no longer suitable as an executable object, and the actual executable object becomes a thread. The process at this point simply provides a resource environment where threads can use these shared resources to accomplish specific things together. This process model that provides a resource environment is what I call a resource model.
In this section, though, I use Winnt as an example to describe another process model, just because it is purely representative of this model. In fact, a lot of Unix versions are also trying to merge the fork model and resource model, both to inherit UNIX semantics and to implement multi-thread scheduling.

3. Harmonization of two models

First, the conflict between the fork model and the resource model is obvious, typically reflected in the following two areas:
1. Signal problem: Exactly which thread performs signal processing;
2.fork semantics: Assume that a thread has been run, where the fork is executed, and how to interpret which execution flow the fork is in;
The first of these is a better solution, which specifies that if the signal is not caused by an exception caused by the thread itself, it is handled by any thread and vice versa by the thread that throws the exception. The second problem is tricky, and the tricky part is how some UNIX implements the process model.
Maintain a linked list in the process structure body or the U zone, and save the thread control block pointer! Oh,no! What the hell is going on here? How can Unix forget that the executable object is a process Ah! So does the process become a container for threads? Directly inverted to the resource model, but they are truly pure unix!. Is it a good plan to design LWP? It may be, but it introduces a lot of high-level abstraction, it seems complicated, if a few years later to introduce a new what process? In short, any way to modify the naïve UNIX process model is not a good approach. What about user-Library-level threads? This is not part of the kernel, but it shows the inability of the kernel.
Throw away the realization and return to the mind. Let's take a look at the process, the process group, the relationship between sessions, the most basic executable object is the process, the process group above, the session is a kind of organization to the Process collection encapsulation, each collection has a series of resources to be shared by the process in this collection. such as the environment variables of the session, the command-line variables of the process group, and so on, what is the thread, the thread is not a set of execution flow of the collection shared memory address space? Do you understand something? If we do not understand, we can change the process of the UNIX process model into a scheduling entity, only to go down a layer on the basis of this picture, the thread is naturally supported:
threads, thread collections, process groups, sessions ...
The idea of switching to a dispatch entity is:
scheduling entities, scheduling entity groups, process groups, Sessions ...
Just as there can be only one process in a process group, and the group ID equals the process ID, there can be only one thread inside the process, and the thread ID is the process ID. Everything is unified into the picture of the UNIX process model, and if a thread collection has only one thread, then we call it a process, and if we have more than one thread, we call this set a process, and the elements of the collection are threads. In fact, at this moment, it doesn't matter what you call it.
What is missing now? The missing is how to implement the Thread collection shared memory address space. The traditional UNIX fork model is undoubtedly unable to do this because it does not have any parameters to indicate the implementation of this behavior. You need to modify the fork semantics a little bit and introduce a clone call that contains parameters that the user can control:

int clone (int (*FN) (void *), void *child_stack, int flags, void *arg, .../* pid_t *ptid, struct USER_DE SC *tls, pid_t *ctid */);


Users can not only control the location of the user stack, there are many flags to choose from, if you want to share the memory of the caller, CLONE_VM this sign is definitely needed, of course, you want to CLONE thread not only need this a flag, here is not detailed, specifically can refer to NPTL the latest specifications.

4.Linux implementation of the UNIX process model

The Linux implementation of threading support is very handsome, it almost does not touch any existing task_struct structure, nor change any of the existing fork semantics. It simply introduces a PID type called Tgid, which is the process group ID. The executable object in Linux is Task_struct, and only task_struct. Each task_struct has more than one ID, which, according to the different interpretations of these IDs, is a different type, locating the task_struct to a process or a thread of a process. The ID type is as follows:

Enum pid_type{pidtype_pid, Pidtype_tgid, Pidtype_pgid, Pidtype_sid, Pidtype_max};


which
Pidtype_pid:The dispatch entity ID. If the task_struct is a thread of a process, then it is the thread ID, and if the process has only a unique thread, it is also the process ID;
Pidtype_tgid,:The Thread collection ID. If the task_struct belongs to a process that has multiple threads, it is the process ID, and if there is only one thread, it is equivalent to pidtype_pid;
Pidtype_pgid:The Process group ID. Not explained;
Pidtype_sid:The session ID. Not explained.
According to the above explanation, regardless of whether a process has one thread or multiple threads, the process ID is the PID equal to the ID of the Pidtype_tgid identity. The ID of the pidtype_pid is given a different explanation depending on the situation. The concrete implementation is as follows:
1. Each task_struct has a unique ID identifier in this PID namespace, which is assigned to both the process ID and the thread ID at the time of initialization;
2. If the task_struct is the first thread of a process, which is created by a standard fork call, the initialization value of 1 remains unchanged;
3. If the task_struct is not the first thread of a process, which is created by a clone call with CLONE_VM, then the ID of the current caller's pidtype_tgid identity is overwritten with the new task_struct Pidtype_ ID of the Tgid ID;
4. For the process group ID and the session ID settings, there are specialized setpgid, SETPGRP,SETSID and other system calls to complete, the implementation is similar to the above process and thread;
5. There are 4 PID structures in each task_struct, which connect these PID structures instead of the task_struct itself with linked lists, indicating who is the process, who is the thread of the process, and who is the group of members who is the head of the process ...
In short, in Linux, whether it is a thread, or a process, is using the TASK_STRUCT structure, the value of its PID type to indicate how to build the picture of the UNIX process model, this is really too handsome. Personally think or use a graph to indicate the connection method is more intuitive, the text expression in this respect weak explosion:


650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M02/4C/9E/wKiom1RBxlWAU1nUAApC91id4Ak872.jpg "title=" Pid.jpg "alt=" Wkiom1rbxlwau1nuaapc91id4ak872.jpg "/>


If you understand the diagram above, you will understand how handsome Linux is in implementing the UNIX process model. A model so streamlined is exactly the same as Linux's streamlined implementation, and somehow the traditional UNIX approach is so complicated ... Linux implementation clearly insight into the UNIX process model hierarchy structure, that is, the process, the process group, the session of the three levels, if further down a level, will task_struct down to the bottom, the basic plot of the above picture.

5. A poetic word

Dennis Ritchie, looking back at the history of Unix, spoke at the end of the story, which was simply a poet's mouth, a poetic expression that only the true nature of the poem can Express, and how special Dennis Ritchie's feelings for Unix are:
One of the comforting things about old memories was their tendencyto take on a rosy glow. The programming environment provided by the early versions of Unix Seems,when described here, to be extremely harsh and PR Imitive. I am sure that if forced back to the PDP-7 I would find it intolerably limiting andlacking in conveniences. Nevertheless, it did not seem so at the time;the memory fixes in what is good and what lasted, and on the joy of Helpingt o Create the improvements that made life better. In ten years, I hope we can look back with the same mixed impressionof progress combined with continuity.


Plain UNIX-process/threading model

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.