[Linux] System call Understanding (2)

Source: Internet
Author: User
Tags terminates

This paper introduces the concept of process under Linux, and emphatically explains the 4 important system calls Getpid,fork,exit and _exit related to Linux process management, and assists some routines to explain their characteristics and usage methods.

Some necessary knowledge about the process

Let's take a look at the standard definition of a process in a university textbook: "A process is a process that can execute concurrently on a data set. "This definition is very rigorous, and difficult to understand, if you do not understand this sentence, you may wish to see the author's own not rigorous explanation." As we all know, an executable file on a hard disk is often referred to as a program, and in a Linux system, when a program starts executing, it is called a process in the memory part of the time it begins execution until execution exits.

Of course, this explanation is not perfect, but the benefit is easy to understand, in the following article, we will make some more comprehensive understanding of the process.

Introduction to the Linux process

Linux is a multitasking operating system, which means that at the same time, multiple processes can be executed simultaneously. If the reader has a certain understanding of the computer hardware system, you will know that our common single-CPU computer actually can only execute one instruction in a time fragment, then how does Linux implement multi-process simultaneous execution? Originally, Linux used a means called "process scheduling", first, for each process to assign a certain run time, this time is usually very short, short to milliseconds, and then according to a certain rule, from a number of processes to pick a run, Other processes wait temporarily, when the running process runs out of time, or exits after execution, or for some reason, Linux will reschedule and pick the next process to run. Because each process occupies a short time slice, it is as if multiple processes are running at the same point in our user's view.

In Linux, each process is assigned a data structure when it is created, called the Process Control block, or PCB. The PCB contains a lot of important information for the system scheduling and the process itself to perform the use, the most important is the process ID, process ID is also called the process identifier, is a non-negative integer, in the Linux operating system uniquely flag a process, On the I386 architecture we use most often (that is, the architecture used by the PC), a non-negative integer has a range of 0-32767, which is also the process ID that we are all likely to take. In fact, from the name of the process ID can be seen, it is the process of the identity card number, each person's identity card number will not be the same, the process ID of each process will not be the same.

One or more processes can be combined to form a process group, and one or more process groups can be combined to form a session. This gives us the ability to perform bulk operations on the process, such as sending signals to each process in the group by sending a signal to a process group.

Finally, let's take a look at the PS command to see how many processes are currently running in your system:

$ps-aux (following is the result of running on my computer, your results are likely to be different from this.)        ) USER PID%cpu%MEM VSZ RSS TTY STAT START time Commandroot 1 0.1 0.4 1412 520?        S May15 0:04 init [3]root 2 0.0 0.0 0 0?        SW May15 0:00 [keventd]root 3 0.0 0.0 0 0?        SW May15 0:00 [kapm-idled]root 4 0.0 0.0 0 0?        SWN May15 0:00 [ksoftirqd_cpu0]root 5 0.0 0.0 0 0?        SW MAY15 0:00 [kswapd]root 6 0.0 0.0 0 0?        SW May15 0:00 [kreclaimd]root 7 0.0 0.0 0 0?        SW MAY15 0:00 [bdflush]root 8 0.0 0.0 0 0?        SW MAY15 0:00 [kupdated]root 9 0.0 0.0 0 0?        sw< May15 0:00 [mdrecoveryd]root 13 0.0 0.0 0 0?        SW May15 0:00 [kjournald]root 132 0.0 0.0 0 0?        SW May15 0:00 [kjournald]root 673 0.0 0.4 1472 592? S May15 0:00 syslogd-m 0rooT 678 0.0 0.8 2084 1116?        S May15 0:00 klogd-2rpc 698 0.0 0.4 1552 588?        S May15 0:00 portmaprpcuser 726 0.0 0.6 1596 764?        S May15 0:00 rpc.statdroot 839 0.0 0.4 1396 524?        S May15 0:00/usr/sbin/apmd-proot 908 0.0 0.7 2264 1000?        S May15 0:00 xinetd-stayaliveroot 948 0.0 1.5 5296 1984?        S May15 0:00 sendmail:acceptiroot 967 0.0 0.3 1440 484?        S May15 0:00 gpm-t ps/2-m/dwnn 987 0.0 2.7 4732 3440?        S MAY15 0:00/usr/bin/cserverroot 1005 0.0 0.5 1584 660?        S May15 0:00 crondwnn 1025 0.0 1.9 3720 2488?        S May15 0:00/usr/bin/tserverxfs 1079 0.0 2.5 4592 3216?        S May15 0:00 Xfs-droppriv-dadaemon 1115 0.0 0.4 1444 568?    s MAY15 0:00/usr/sbin/atdroot 1130 0.0 0.3 1384 448 tty1 S May15 0:00/sbin/mingetty ttroot  1131 0.0 0.3 1384 448 tty2 s May15 0:00/sbin/mingetty ttroot 1132 0.0 0.3 1384 448 tty3 s MAY15 0:00/sbin/mingetty ttroot 1133 0.0 0.3 1384 448 tty4 S May15 0:00/sbin/mingetty ttroot 11 0.0 0.3 1384 448 tty5 s May15 0:00/sbin/mingetty ttroot 1135 0.0 0.3 1384 448 tty6 s May1        5 0:00/sbin/mingetty ttroot 8769 0.0 0.6 1744 812? s 00:08 0:00 in.telnetd:192.1root 8770 0.0 0.9 2336 1184 pts/0 S 00:08 0:00 Login--Leilei 87 0.1 0.9 2432 1264 pts/0 S 00:08 0:00-bashlei 8809 0.0 0.6 2764 808 pts/0 R 00:09 0:00 PS -aux

  

In addition to headings, each row represents a process. In each column, the PID column represents the process id,command a column represents the name of the process or the command line that is called in the shell, and I will not explain the specific meaning of the other columns, and interested readers can refer to the relevant books.

Getpid

In the 2.4.4 kernel, Getpid is the 20th system call, and its prototype in the Linux library is:

#include <sys/types.h>/* provides definition of type pid_t */#include <unistd.h>/* provides definition of function */pid_t getpid (void);

  

The role of Getpid is simply to return the process ID of the current process, so let's look at the following example:

/* getpid_test.c */#include <unistd.h>main () {printf ("The current process ID is%d\n", getpid ());}

  

The attentive reader may notice that the definition of this program does not contain the header file Sys/types.h, because we do not use the pid_t type in the program, and the pid_t type is the type of the process ID. In fact, on the i386 architecture (which is the architecture of our general PC), the pid_t type is fully compatible with the int type, and we can handle the pid_t type of data using the shaping number method, for example, "%d" to print it out.

Compile and run the program getpid_test.c:

$GCC getpid_test.c-o getpid_test$./getpid_testthe Current Process ID is 1980 (your own running results are likely to be different from this number, which is quite normal.) )

  

Run it again:

$./getpid_testthe Current Process ID is 1981

  

As we can see, even though it is the same application, each time it runs, the assigned process identifiers are not the same.

Fork

In the 2.4.4 kernel, fork is the 2nd system call and its prototype in the Linux library is:

#include <sys/types.h>/* provides definition of type pid_t */#include <unistd.h>/* provides definition of function */pid_t fork (void);

  

Just looking at the fork's name, it may be rare for several people to guess what it is for. The function of a fork system call is to replicate a process. When a process calls it, it completes with two almost identical processes, and we get a new process. It is said that the fork's name derives from the work flow that is quite similar to the shape of the fork.

In Linux, there is only one way to create a new process, which is the fork we are introducing. Other library functions, such as system (), seem to be able to create new processes, and if you can look at their source code, you'll see that they actually call the fork internally. This includes running the application at the command line, and the new process is made by the shell call Fork. Fork has some very interesting features, let's go through a small program to learn more about it.

/* fork_test.c */#include <sys/types.h> #inlcude <unistd.h>main () {pid_t pid;/* there is only one process */pid=fork at this time ();/* At this point there are already two processes running */if (pid<0) printf ("Error in fork!") at the same time; else if (pid==0) printf ("I am the child process, my process ID is%d\n", getpid ()); elseprintf ("I am the parent process, my PR Ocess ID is%d\n ", getpid ());}

  

Compile and run:

$GCC Fork_test.c-o Fork_test$./fork_testi am The parent process, my process ID is 1991I am the child process, my process ID is 1992

  

Looking at this program, the mind must first understand a concept: before the statement pid=fork (), only one process executes the code, but after this statement, it becomes two processes executed, the two processes of the code part is exactly the same, the next statement will be executed is the IF (pid==0 )......。

Of the two processes, the one that originally existed is called the "parent process", and the newly emerging one is called a "subprocess". The difference between a parent-child process and a process ID differs from the value of the variable PID, and the PID holds the return value of the fork. A wonderful thing about a fork call is that it is called only once, but it can return two times, and it may have three different return values:

    1. In the parent process, fork returns the process ID of the newly created child process;
    2. In a child process, fork returns 0;
    3. If an error occurs, fork returns a negative value;

There are two possible reasons for fork errors: (1) The current number of processes has reached the system-specified limit, when the value of errno is set to Eagain. (2) The system memory is low, then the value of errno is set to Enomem. (for the meaning of errno, please refer to the first article in this series.) )

The possibility of a fork system call error is minimal, and if an error occurs, it is generally the first error. If there is a second error stating that the system already has no allocated memory and is on the edge of the crash, this situation is rare for Linux.

In this case, the smart reader may have fully understood the rest of the code, if the PID is less than 0, indicating that an error occurred, pid==0, the fork returned 0, it means that the current process is a child process, to execute printf ("I am The child!"), otherwise (else), The current process is the parent process, which executes printf ("I am The parent!"). Perfectionists will find this redundant because each of the two processes has a statement that they will never execute. Do not have to worry too much, after all, many years ago, the Unix fathers in the memory is too small to imagine the computer is such a program, with our today's "massive" memory, can completely put these several bytes of concern forget all about.

In this case, some readers may still have questions: if the sub-process is almost exactly the same as the parent process, and the only way to generate a new process in the system is to fork, wouldn't it be the same for all the processes in the system? So what do we do when we're going to execute a new application? From the experience of Linux systems, we know that this problem does not exist. As to what method was adopted, we left this question behind for specific discussion.

Exit

In the 2.4.4 kernel, exit is called number 1th, and its prototype in the Linux library is:

#include <stdlib.h>void exit (int status);

  

Not as difficult to understand as fork, from the name of exit can be seen, this system call is used to terminate a process. Regardless of where in the program, as long as the exit system is called, the process stops all remaining operations, clears the various data structures including the PCB, and terminates the process. Take a look at the following program:

/* exit_test1.c */#include <stdlib.h>main () {printf ("This process would exit!\n"); exit (0);p rintf ("Never Be Displayed!\n ");}

  

Run after compilation:

$GCC exit_test1.c-o exit_test1$./exit_test1this Process would exit!

  

As we can see, the program does not print the following "Never be displayed!\n" because before that, the process terminates when it executes to exit (0).

The exit system calls the parameter status with an integer type, and we can use this parameter to pass the state at the end of the process, for example, if the process is a normal end, or if there is an unexpected end, in general, 0 means no unexpected normal end; other values indicate an error. The process is not finished properly. When we are actually programming, we can use the wait system call to receive the return value of the child process, thus dealing with different situations. We will cover the details of wait in a later space.

Exit and _exit

As a system call, _exit and exit are twins, and we can find the answer from the Linux source:

#define __NR__EXIT __nr_exit/* Excerpt from file include/asm-i386/unistd.h line No. 334 */

"__nr_" is the prefix for each system call in the Linux source, note that there are 2 underscores before the first exit, and only 1 underline before the second exit.

At this point, a person who understands C and has a clear mind will say that there is no difference between _exit and exit, but we also want to talk about the difference between the two, which is mainly reflected in their definition of the function library. The prototypes of _exit in the Linux libraries are:

#include <unistd.h>void _exit (int status);

  

Compared to exit, the exit () function is defined in Stdlib.h, and _exit () is defined in Unistd.h, and Stdlib.h seems to be a bit more advanced than unistd.h, so what is the difference between them? Let's take a look at the flowchart, through which we have a more intuitive understanding of the execution of these two system invocations.

As you can see, the _exit () function is the simplest: it directly stops the process, clears the memory space it uses, and destroys its various data structures in the kernel; the exit () function makes some packaging on these bases and adds several operations before the exit, and for this reason, Some people think that exit is not a purely system call.

The most important difference between the exit () function and the _exit () function is that the exit () function checks the opening of the file before calling the exit system, and writes the contents of the file buffer back to the file, which is the "clean I/O buffer" item in the diagram.

In the standard library of Linux, there is a set of functions called "Advanced I/O", which are known as printf (), fopen (), Fread (), fwrite (), which are also known as "Buffered I/O (buffered I/Os)", Its characteristics are corresponding to each open file, there is a buffer in memory, each time you read a file, you will read more than a few records, so that the next time you read a file can be read directly from the memory buffer, each time the file is written, it is only written in memory buffer, etc. to meet a certain number of conditions (reached a certain amount, or encounter specific characters, such as newline character \ n and file Terminator eof, and then write the contents of the buffer once to the file, which greatly increases the speed of file read and write, but also for our programming a little bit of trouble. If there is some data, we think has been written to the file, in fact, because it does not meet the specific conditions, they are only in the buffer, then we use the _exit () function to close the process directly, the data in the buffer will be lost, conversely, if you want to ensure the integrity of the data, you must use Exit () Function.

Take a look at the following routines:

/* exit2.c */#include <stdlib.h>main () {printf ("Output begin\n");p rintf ("content in buffer"); exit (0);}

  

Compile and run:

$GCC exit2.c-o exit2$./exit2output begincontent in buffer/* _exit1.c */#include <unistd.h>main () {printf ("Output begin\n ");p rintf (" content in buffer "); _exit (0);}

  

Compile and run:

$GCC _exit1.c-o _exit1$./_exit1output begin

  

In Linux, standard input and standard output are processed as files, although they are a special kind of file, but from the programmer's point of view, they are no different from the ordinary files on the hard disk where the data is stored. As with all other files, they also have their own buffers after they are opened.

Let the reader combine the previous narrative and consider why the two programs will produce different results. I believe that if you understand what I said earlier, it will be easy to reach a conclusion.

Cond

In this article, we have a preliminary understanding of the process management of Linux, and based on this, we learned four system calls for Getpid, fork, exit and _exit. In the next article, we will learn about other system calls related to Linux process management, and will do some more in-depth discussion. OK, see you next.

Resources
    1. Linux Mans pages
    2. Advanced programming in the UNIX environment by W. Richard Stevens, 1993
    3. Linux Core source code Analysis Pengxiaoming, Wang Qiang, 2000
    4. Linux C Programmer's Guide Zhinyong, 2000
    5. IBM system calls with me learn (2)

[Linux] System call Understanding (2)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.