4 process
In this chapter, we will discuss one of the most basic abstractions that the operating system provides to users: process . Informally, the definition of a process is simple: a running program [V + 65,bh70]. The program itself is a lifeless thing: it is just on the disk, including a bunch of instructions (and some static data), waiting for the action to begin. The operating system reads the bytes and makes it run, converting the program to a useful program. It turns out that people often want to run multiple programs at the same time, for example, considering that you might like to run Web browsers, mail programs, games, music players, etc. on your desktop or laptop computer. In fact, a typical system might seem to run dozens of or even hundreds of processes at the same time. Doing so makes the system easy to use, and people don't have to worry about whether the CPU is available, just running the program. So our challenge is:
Question key: How to provide the illusion of multiple CPUs?
Although there are few physical CPUs available here, how does the OS provide the illusion of multiple CPUs?
The operating system creates this illusion by virtualizing the CPU . By running a process, then stopping it and running another process, and so on, the operating system can provide the illusion that many virtual CPUs exist, and actually have only one physical CPU (or several). This basic technique, called the CPU ticks (time sharing), allows the user to run as many concurrent processes as needed, and the potential cost is performance because each process will run slower if the CPU is shared.
In order to achieve a good CPU virtualization, the OS requires both the underlying mechanism (mechanisms) and the high-level strategy (policies).
A mechanism is a low-level method or protocol that implements the required functionality. For example, we will later learn how to implement a context switch that enables the operating system to stop running a program and start running another program on a given CPU; All modern operating systems use this time-sharing mechanism.
A strategy is an algorithm for an OS to make some kind of decision. For example, given the many possible programs running on the CPU, which program should the operating system run? The scheduling policy in the operating system will make this decision and may use historical information (for example, which program runs more at the last minute?). ), workload knowledge (for example, the type of program that is running) and performance metrics (for example, does the system optimize interactive performance or throughput?). ) to make a decision.
Tip: Use time sharing (and space sharing)
Ticks are the basic techniques used by the operating system to share resources. Resources (for example, CPU or network links) can be shared by many people by allowing resources to be used by one entity for a period of time and then by another entity for a period of time. The counterpart of ticks is spatial sharing, in which resources are divided (spatially) among the people wishing to use it. For example, disk space is naturally a space-shared resource, and once a block is assigned to a file, it is usually not assigned to another file until the user deletes the original file.
4.1 Process
The abstraction that the OS provides for running programs is what we call a process. As mentioned above, a process is a running program. At any given moment, we can summarize a process by taking stock of different parts of the system that it accesses or affects during execution.
In order to understand the composition of a process, we must understand its machine state : What the program can read or update at run time. At any given time, what parts of the machine are important to the execution of the program?
An obvious component of the machine state that contains the process is its memory. Instruction is stored in memory, and data read and written by the running program is also in memory. Therefore, the memory that the process can address (known as its address space ) is part of the process.
Part of the machine state of the process is the register. Many instructions explicitly read or update registers, so they are important to the execution process.
Note that there are special registers that form part of this machine state. For example, a program counter (PC) (sometimes called a directive pointer (instruction POINTER,IP)) tells us which program instruction is currently executing. Similarly, stack pointers and associated frame pointers are used to manage function parameters, local variables, and the stack of return addresses.
Finally, programs often also access persistent storage devices. This type of I/O information may include a list of files currently open by the process.
4.2 Process API
While we will discuss the actual process API deferred to subsequent chapters, here we first need to understand what the operating system must contain in any interface. These APIs are available in some form for any modern operating system.
- Create: The operating system must contain some methods for creating a new process. When you type a command in the shell or double-click the application icon, the operating system is called to create a new process to run the program that you specified.
- Destroy: Because there is an interface for creating processes, the system can also provide an interface that forces the process to be destroyed. Of course, many processes will be completed on their own, but when they do not, the user may want to kill them, so it is useful to stop the interface of the runaway process. 、
- Wait: Sometimes it is useful to wait for a process to stop running, so you often provide some kind of waiting interface.
- Miscellaneous control: In addition to killing or waiting for a process, there may be other controls at times. For example, most operating systems provide a way to pause a process (prevent it from running for a period of time) and then restore it (continue running).
- Status: There is usually an interface to get some state information about the process, such as the time or state of the run.
4.3 Creation of the process
How does the program turn into a process, specifically, how the operating system starts and runs the program? How does process creation actually work?
The first thing the operating system must do in order to run a program is to load (load) its code and any static data (for example, initialization variables) into the address space of the process. The program initially resides on disk in some executable format (or, in some modern systems, flash-based SSDs); The process of loading programs and static data into memory requires the operating system to read these bytes from disk and place them in memory (4.1).
In an early (or simple) operating system, the loading process was completed before the program was run, that is, the modern operating system is lazy (lazily) to execute the process, which is to load code or data only as needed during program execution. To really understand how lazy loading of code and data works, you must learn more about the paging ((paging) and Exchange (swapping) mechanisms, and we'll discuss these topics as we discuss memory virtualization. Now just remember that before running anything, the operating system obviously has to do some work to put the program's important bytes from the disk into memory.
After the code and static data are loaded into memory, the operating system needs to perform some additional operations before running the process. You must allocate some memory for the program's run-time stack (run-time stack) (or stack only). As you may already know, the C program uses the stack for local variables, function arguments, and return addresses. The operating system allocates this memory and provides it to the process. The operating system may also use parameters to initialize the stack, specifically, it will populate the parameters of the main ()) function, that is, the argc and argv arrays.
The operating system can also allocate some memory for the program's heap (heap). In a C program, the heap is used to dynamically allocate data for explicit requests, and the program requests such space by calling malloc () and explicitly releasing it by calling Free (). Data structures require heaps, such as linked lists, hash tables, trees, and other interesting data structures. The heap will initially be small, and when the program runs and requests more memory through the malloc () library API, the operating system may participate and allocate more memory to the process to help meet such calls.
The operating system will also perform some other initialization tasks, especially those related to input/output (I/O). For example, on UNIX systems, by default, each process has three open file descriptors for standard input, output, and errors. These descriptors make it easy for the program to read input from the terminal and print the output to the screen. We'll learn more about persistent I/O, file descriptors, and more in the third part of this book.
By loading code and static data into memory, the operating system now (eventually) sets the stage for program execution by creating and initializing stacks, and by performing other work related to I/O settings. So it has a final task: Start a program that runs at the entry point, that is, main (). By jumping to the main () method (through the specialized mechanism we will discuss in the next chapter), the OS transfers control of the CPU to the newly created process, and the program begins execution.
4.4 Process Status
Now that we know what a process is (although we will continue to improve the concept) and (roughly) how to create it, let's talk about the different states a process can be in at a given time. The notion that a process can be in one of these states occurs in an earlier computer system [Dv66,v + 65]. In a simplified view, a process can be in one of the following three states:
- Running: In the running state, the process is running on the processor. This means that it is executing instructions.
- Ready: The process is ready to run, but for some reason the operating system has chosen not to run it at this time.
- Blocked: In the blocking state, the process has performed some action to make it fail to run until another event occurs. A common example: when a process initiates an I/O request to disk, it is blocked, so other processes can use the processor.
If we map these states to the chart, we will get the chart in Figure 4.2. As you can see in the diagram, depending on the operating system's decision, the process may move between the prepared state and the running state. From ready run to run means that the process has been dispatched (scheduled); moving from run to ready means that the process has been canceled (descheduled). Once the process is blocked (for example, by initiating an I/O operation), the OS remains so until certain events occur (for example, I/O completion), at which point the process is again in a ready state (it may run again immediately if the operating system decides).
Let's understand the process through some of these states. First, assume that there are two processes running, and that each process uses only the CPU (they do not have I/O). In this case, the status trace for each process might look like this (Figure 4.3).
In the next example, the first process emits I/O after running for a period of time. At this point, the process is blocked, giving the other process a chance to run. Figure 4.4 Shows traces of this situation. More specifically, process 0 initiates I/O and is blocked waiting for it to complete, for example, when reading from disk or waiting for packets from the network, the process is blocked. The operating system recognizes that process 0 is not using the CPU and starts running process 1. When Process 1 is running, I/O completes and process 0 is restored to the ready state. Finally, Process 1 is complete, and process 0 runs and then finishes.
Note that even in this simple example, the operating system must make a number of decisions. First, when process 0 issues I/O, the system must decide to run Process 1, which can improve resource utilization by keeping the CPU busy. Second, the system decides not to switch back to process 0 when its I/O is complete, and it is unclear whether this is a good decision. What do you think? These types of decisions are made by the OS scheduler, and we will discuss several chapters in the future.
4.5 Data Structures
The operating system is a program that, like any program, has some key data structures that track various relevant information. For example, to keep track of the status of each process, the operating system might keep a list of processes for all ready processes, as well as some additional information about the processes that are currently running. The operating system must also track the blocked process in some way, and when the I/O event finishes, the operating system should ensure that the correct process is awakened and ready to run again.
Figure 4.5 shows the type of information that the operating system needs to track each process in the XV6 kernel [CK + 08]. A similar process structure exists in the "real" operating system, such as Linux,mac OS X or Windows. Look at them and see how complex they are. , you can see several important information about the operating system tracking process. For a stopped process, the register context retains the contents of its register. When the process stops, its registers are saved to that memory location. By recovering these registers (that is, putting their values back into the actual physical registers), the OS can complete the process. In the following chapters, we will learn more about this technique, known as context switching .
/* * The registers XV6 will save and restore * To stop and subsequently restart a process * when the stop and restart processes, Xv6 saves and recovers registers */stru CT context {int EIP; int esp; int ebx; int ecx; int edx; int esi; int EDI; int ebp;};/ * Process possible states */enum Proc_state {UNUSED, embryo, sleeping, RUNNABLE, RUNNING, ZOMBIE};/* * xv6 track information for each process, including register context and status */struct proc {char * MEM; /* Start of process memory */UINT SZ; /* Size of process Memory */char * KSTACK; /* Bottom of kernel stack *//* for this process */enum proc_state state; /* Process state */int pid; /* Process ID */struct proc * parent; /* Parent Process */void * CHAN; /* If Non-zero, sleeping on Chan */int killed; /* If Non-zero, have been killed */struct file * Ofile[nofile]; /* Open files */struct inode * CWD; /* Current directory */sTruct context Context; /* Switch here to run process */struct trapframe * TF; /* Trap frame for the *//* current interrupt */};
Figure 4.5:the Xv6 Proc Structure
You can also see that in addition to running, preparing and blocking, there are some other states that can be entered. Some systems provide the initial state for the process that was just created. In addition, the process can be placed in a final state that has exited but has not yet been purged (in Unix-based systems, this is known as the Zombie (zombie) state). This final state is useful because it allows other processes (usually the parent process of the creation process) to check the return value of the process and see if the process just completed successfully (typically, a UNIX-based system, the program successfully executes returns 0, otherwise it is not 0). When finished, the parent process makes the last call (for example, Wait ()) to wait for the child process to complete and indicates to the operating system that it can clean up any related data structures that involve the currently dead process.
Side note: Data structure-List of processes
The operating system is full of important data structures that we will discuss in these comments. A list of processes is the first such structure. It is a simpler structure, and any operating system that can run multiple programs at the same time will have something similar to this structure in order to keep track of all the running programs in the system. Sometimes people will refer to the individual structure of stored procedure information as the Process Control block (BLOCK,PCB).
4.6 Summary
We have introduced the most basic abstraction of the operating system: the process. It is simply considered a running program. With this conceptual perspective in mind, we will now continue to discuss the substantive process: the low-level mechanisms required to achieve the process, and the high levels of policy needed to intelligently dispatch processes. By combining mechanisms and policies, we will build an understanding of how the operating system virtualize the CPU.
4. Introduction to the process