Multi-process programming in Linux

Source: Internet
Author: User
Tags new set random seed
Article Abstract:
Multithreading Program The concept of design was proposed as early as 1960s, but it was not until the middle of 1980s that the multi-thread mechanism was introduced in Unix systems. Nowadays, due to its many advantages, multi-thread programming has been widely used. This article will introduce some preliminary knowledge about writing multi-process and multi-threaded programs in Linux.

--------------------------------------------------------------------------------

Body:
Multi-process programming in Linux

1 Introduction
For those who have never been familiar with Unix/Linux operating systems, fork is one of the most difficult concepts to understand: it executes once but returns two values. The fork function is one of the most outstanding achievements of Unix systems. It was one of the achievements made by developers in the early 1970s s after a long and painstaking theoretical and practical exploration, it minimizes the cost of process management by the operating system. On the other hand, it provides programmers with a simple and clear multi-process method. Unlike dos and earlier Windows systems, Unix/Linux systems are truly multi-task operating systems. It can be said that programming in a Linux environment cannot be considered true without multi-process programming.
The concept of multi-threaded programming was proposed as early as 1960s, but it was not until the middle of 1980s that the multi-threaded mechanism was introduced in Unix systems. Nowadays, due to its many advantages, multi-threaded programming has been widely used.
Next, we will introduce some preliminary knowledge about writing multi-process and multi-threaded programs in Linux.

2 multi-process Programming
What is a process? The concept of process is for the system rather than for the user. For the user, the concept of process is for the program. When you press a command to execute a program, the system starts a process. However, unlike programs, in this process, the system may need to start one or more processes to complete multiple independent tasks. The main content of multi-process programming includes Process Control and inter-process communication. Before learning about this, we must first know the structure of the process.

2.1 Structure of processes in Linux
The next Linux Process has three parts of data in the memory"CodeSegment "," Stack segment ", and" data segment ". In fact, people who have learned assembly language know that the general CPU has the above three register segments to facilitate the operation of the operating system. These three parts are also necessary to form a complete execution sequence.
"Code segment", as its name implies, stores the data of program code. If several processes on the machine run the same program, they can use the same code segment. The "Stack segment" stores the return address of the subroutine, the parameters of the subroutine, and the local variables of the program. The data segment stores the global variables, constants, and dynamic data space allocated by the Program (for example, space obtained using functions such as malloc ). There are many details here, so we will not discuss them much here. If the system runs several identical programs at the same time, the same stack segment and data segment cannot be used between them.

2.2 Process Control in Linux
In the traditional Unix environment, there are two basic operations used to create and modify processes: the fork () function is used to create a new process, which is almost a full copy of the current process; the Exec () function family is used to start another process to replace the currently running process. The Process Control in Linux is basically the same as that in traditional UNIX processes, but there are some differences in some details. For example, in Linux systems, calling vfork and fork is exactly the same, in some versions of Unix systems, vfork calls have different functions. Since these differences hardly affect most of our programming, we will not consider them here.
2.2.1 fork ()
Fork means "fork" in English. Why is this name used? Because a process is running, if fork is used, another process is generated, so the process is "Forked", so this name is very good. The following describes how to use fork. This program demonstrates the basic framework of using fork:

Void main (){
Int I;
If (Fork () = 0 ){
/* Sub-process program */
For (I = 1; I <1000; I ++) printf ("this is child process \ n ");
}
Else {
/* Parent process program */
For (I = 1; I <1000; I ++) printf ("this is process \ n ");
}
}
After the program runs, you can see that the screen displays one thousand pieces of information each printed by the child process and the parent process. If the program is still running, you can use the ps command to see that there are two running programs in the system.
So what happens when this fork function is called? The fork function starts a new process. As we have said before, this process is almost a copy of the current process: the child process and the parent process use the same code segment; the sub-process copies the stack and data segments of the parent process. In this way, all data of the parent process can be left to the child process. However, once the child process starts to run, although it inherits all data of the parent process, the data is actually separated, there is no impact between them, that is, they no longer share any data. When they need to interact with each other, they are only implemented through inter-process communication. This will be the content below. Since they are so similar, how can the system differentiate them? This is determined by the return value of the function. For the parent process, the fork function returns the process Number of the subroutine, and for the subroutine, the fork function returns zero. In the operating system, we can use the PS function to see different process numbers. For the parent process, its process numbers are assigned by system calls at a lower level than the parent process, for a child process, its process number is the return value of the fork function to the parent process. In programming, both the parent process and child process must call the code below the fork () function, and we use the fork () function to return different values of the Parent and Child processes using if... else... statement to implement different functions for the parent and child processes, just as in the example above. We can see that the two messages in the preceding example are printed out without interaction rules. This is the result of independent execution by the parent and child processes, although our code seems to be no different from the serial code.
Readers may ask, if a large program is running and its data segments and stacks are large, and a fork will be copied once, isn't the system overhead of fork very high? In fact, Unix has its own solution. As you know, CPU generally allocates memory space in units of "pages". Every page is an image of the actual physical memory, like Intel's CPU, one page is usually 4086 bytes in size, and both the data segment and stack segment are composed of many "pages". Fork function copies these two segments, it is only "logical", not "physical". That is to say, when fork is executed, the data and stack segments of the two processes in the physical space are still shared, when a process writes data, the data between the two processes is different, and the system physically separates the different pages. The space overhead of the system can be minimized.
The following shows a small program that is sufficient to "kill" Linux. Source code Very simple:
Void main ()
{
For (;) fork ();
}
This program does nothing, that is, fork in an endless loop. The result is that the program continuously produces processes, and these processes continuously generate new processes. Soon, the process of the system is full, the system is "overwhelmed" by so many constantly generated processes ". Of course, as long as the system administrator sets the maximum number of processes that can be run for each user in advance, this malicious program will not be able to complete the attempt.
2.2.2 exec () function family
Next let's take a look at how a process can start the execution of another program. Use the exec function family in Linux. The system calls execve () to replace the current process with a specified program. Its parameters include the file name (filename), the parameter list (argv), and the environment variable (envp ). Of course, there are more than one exec function family, but they are roughly the same. In Linux, they are: execl, execlp, execle, execv, execve, and execvp. Below I only take execlp as an example, what are the differences between other functions and execlp? Use the manexec command to learn about them.
Once a process calls the exec function, it is "dead". The system replaces the code segment with the code of the new program and discards the original data segment and stack segment, and allocate new data segments and stack segments for the new program. The only difference is the process number. That is to say, for the system, it is the same process, but it is already another program. (However, Some exec functions can inherit information such as environment variables .)
So what if my program wants to start the execution of another program but still wants to continue running? That is, combined with fork and exec. The following code starts other programs:

char Command [256];
void main ()
{< br> int RTN; /* the return value of the sub-process */
while (1) {
/* read the command to be executed from the terminal */
printf (">");
fgets (command, 256, stdin );
Command [strlen (command)-1] = 0;
If (Fork () = 0) {
/* sub-process executes this command */
execlp (command, command);
/* If the exec function returns, the command is not executed normally, print the error message */
perror (command);
exit (errorno);
}< br> else {
/* parent process, wait until the child process ends and print the return value of the child process */
wait (& RTN);
printf ("child process return % d \ n ",. RTN);
}< BR >}

the program reads and executes commands from the terminal. After the execution is complete, the parent process continues to wait for the command to be read from the terminal. If you are familiar with DOS and Windows system calls, you must know that DOS/Windows also has exec functions. The usage is similar, but dos/Windows also has spawn functions, because DOS is a single-task system, it can only "parent process" resident in the machine and then execute "sub-process", which is a function of the spawn class. Win32 is already a multi-task system, but it also retains the spawn class functions. The methods for implementing the spawn function in Win32 are similar to those in the preceding UNIX, after a sub-process is opened, the parent process continues to run after the sub-process ends. UNIX is a multi-task system at the beginning, so the spawn class functions are not required from the core point of view.
In this section, we will also talk about the system () and popen () functions. The system () function calls fork () first, and then exec () is called to execute the user's logon shell. It searches for executable file commands and analyzes parameters, finally, it uses one of the wait () function families to wait for the completion of sub-processes. The function popen () is similar to the function system (). The difference is that it calls the pipe () function to create a pipeline to complete the standard input and standard output of the program. These two functions are designed for less diligent programmers and have considerable defects in efficiency and security. They should be avoided whenever possible.

2.3 inter-process communication in Linux
It is absolutely impossible to describe inter-process communication in detail here, and it is difficult for the author to confidently say what kind of knowledge he has achieved on this part of content, therefore, at the beginning of this section, I would like to recommend the famous works of Richard Stevens: Advanced Programming in the Unix environment. its Chinese translation "Advanced Programming in UNIX environment" has been published by the Mechanical Industry Publishing House. The original Article is brilliant. The translation is also authentic. If you are really interested in programming in Linux, so hurry up and place the book next to your desk or computer. It is really hard to suppress the admiration in our hearts. Let's get down to the truth. In this section, we will introduce some of the most preliminary and simplest knowledge and concepts of inter-process communication.
First, inter-process communication can be achieved at least by sending open files. Different processes transmit information through one or more files. In fact, in many application systems, this method is used. However, generally, inter-process communication (IPC: Interprocess Communication) does not include this seemingly low-level communication method. There are many methods to implement inter-process communication in Unix systems, and unfortunately, very few methods can be transplanted in all UNIX systems (the only one is a half-duplex pipeline, this is also the most primitive communication method ). Linux, as a new operating system, supports almost all common inter-process communication methods in UNIX: pipelines, message queues, shared memory, semaphores, and interfaces. Next we will introduce them one by one.

2.3.1 MPs queue
A pipe is the oldest way of inter-process communication. It includes an unknown pipe and a famous pipe. The former is used for communication between parent and child processes, the latter is used for communication between any two processes running on the same machine.
The unknown pipeline is created by the pipe () function:
# Include <unistd. h>
Int pipe (INT filedis [2]);
The filedis parameter returns two file descriptors: filedes [0] is read and filedes [1] is write. Output of filedes [1] is the input of filedes [0. The following example demonstrates how to implement communication between the parent process and the child process.

# Define input 0
# Define output 1

Void main (){
Int file_descriptors [2];
/* Define the sub-process Number */
Pid_t PID;
Char Buf [256];
Int returned_count;
/* Create an unknown MPs queue */
Pipe (file_descriptors );
/* Create a sub-process */
If (pid = fork () =-1 ){
Printf ("error in fork \ n ");
Exit (1 );
}
/* Execute the sub-process */
If (pid = 0 ){
Printf ("in the spawned (child) process... \ n ");
/* The sub-process writes data to the parent process and closes the read end of the MPs queue */
Close (file_descriptors [input]);
Write (file_descriptors [Output], "Test Data", strlen ("Test Data "));
Exit (0 );
} Else {
/* Execute the parent process */
Printf ("in the spawning (parent) process... \ n ");
/* The parent process reads data written by the sub-process from the MPs queue and closes the data written by the MPs queue */
Close (file_descriptors [Output]);
Returned_count = read (file_descriptors [input], Buf, sizeof (BUF ));
Printf ("% d bytes of data converted ed from spawned process: % s \ n ",
Returned_count, Buf );
}
}
In Linux, famous pipelines can be created in two ways: Command Line mknod System Call and function mkfifo. The following two channels generate a famous pipe named myfifo under the current directory:
Method 1: mkfifo ("myfifo", "RW ");
Method 2: mknod myfifo P
After a famous pipeline is generated, you can use common file I/O functions such as open, close, read, and write to operate it. The following is a simple example. Suppose we have created a famous pipe named myfifo.
/* Process 1: Read a famous Pipeline */
# Include <stdio. h>
# Include <unistd. h>
Void main (){
File * in_file;
Int COUNT = 1;
Char Buf [80];
In_file = fopen ("mypipe", "R ");
If (in_file = NULL ){
Printf ("error in fdopen. \ n ");
Exit (1 );
}
While (COUNT = fread (BUF, 1, 80, in_file)> 0)
Printf ("received from pipe: % s \ n", Buf );
Fclose (in_file );
}
/* Process 2: Write a famous Pipeline */
# Include <stdio. h>
# Include <unistd. h>
Void main (){
File * out_file;
Int COUNT = 1;
Char Buf [80];
Out_file = fopen ("mypipe", "W ");
If (out_file = NULL ){
Printf ("error Opening Pipe .");
Exit (1 );
}
Sprintf (BUF, "this is test data for the named pipe example \ n ");
Fwrite (BUF, 1, 80, out_file );
Fclose (out_file );
}

2.3.2 Message Queue
Message Queue is used for communication between processes running on the same machine. It is similar to the pipeline. In fact, it is a communication method that is gradually being eliminated, we can replace it with a Stream pipeline or a set of interfaces. Therefore, we do not want to explain this method, and we recommend that you ignore this method.

2.3.3 shared memory
Shared memory is the fastest way to communicate between processes running on the same machine, because data does not need to be copied between different processes. A shared memory area is usually created by a process, and other processes read and write this memory area. There are two ways to get the shared memory: ing/dev/MEM device and memory image file. The previous method does not bring additional overhead to the system, but is not commonly used in reality, because it controls the access to the actual physical memory. in Linux, this can only be done by limiting the memory used to access the Linux system, which is of course not practical. The common method is to use the shared memory for storage through the shmxxx function family.
The first function to use is shmget, which obtains a shared storage identifier.
# Include <sys/types. h>
# Include <sys/IPC. h>
# Include <sys/SHM. h>
Int shmget (key_t key, int size, int flag );
This function is similar to the familiar malloc function. The system allocates the size of memory as the shared memory according to the request. In the Linux kernel, each IPC structure has a non-negative integer identifier, So that you only need to reference the identifier when sending a message to a message queue. This identifier is obtained by the key word of the IPC structure in the kernel. This keyword is the key of the first function above. The data type key_t is defined in the header file SYS/types. H. It is a long integer data. In our subsequent chapters, we will also encounter this keyword.
After the shared memory is created, other processes can call shmat () to connect it to their own address space.
Void * shmat (INT shmid, void * ADDR, int flag );
Shmid is the shared storage identifier returned by the shmget function. The ADDR and flag parameters determine how to determine the connection address. the return value of the function is the actual address connected to the Data Segment of the process, A process can perform read/write operations on the process.
When using shared storage to implement inter-process communication, note the synchronization of data access. Make sure that the desired data has been written when a process reads data. Generally, semaphores are used to synchronize access to shared storage data. In addition, you can use the shmctl function to set some flags of shared storage memory, such as shm_lock and shm_unlock.

2.3.4 semaphores
Semaphores, also known as semaphores, are used to coordinate data objects between different processes. The most important application is the shared memory mode of inter-process communication in the previous section. Essentially, semaphores are a counter used to record access to a resource (such as shared memory. Generally, to obtain shared resources, the process must perform the following operations:
(1) Test the semaphore that controls the resource.
(2) If the semaphore value is positive, the resource can be used. The process minus 1.
(3) If the semaphore is 0, the resource is currently unavailable, and the process enters the sleep state until the signal value is greater than 0. The process is awakened and transferred to step (1 ).
(4) When a process no longer uses a semaphore-controlled resource, the signal value is increased by 1. If a process is sleeping and waiting for this semaphore at this time, it will be awakened.
The Linux Kernel Operating system rather than the user process is used to maintain the semaphore state. We can see the definition of each structure used by the kernel to maintain the semaphore state from the/usr/src/Linux/include/Linux/SEM. h file. Semaphores are a set of data. You can use each element of this set separately. The first function to be called is semget, which is used to obtain a semaphore ID.
# Include <sys/types. h>
# Include <sys/IPC. h>
# Include <sys/SEM. h>
Int semget (key_t key, int nsems, int flag );
The key is the keyword of the IPC structure discussed earlier. It will decide whether to create a new semaphore set or reference an existing semaphore set in the future. Nsems is the number of semaphores in the set. If you create a new set (generally on the server), you must specify nsems. If you reference an existing set of semaphores (usually on the client), you can specify nsems as 0.
The semctl function is used to operate on semaphores.
Int semctl (INT Semid, int semnum, int cmd, Union semun Arg );
Different operations are implemented through the CMD parameter. Seven different operations are defined in the header file Sem. H. You can refer to the actual programming.
The semop function automatically executes the operation array on the semaphore set.
Int semop (INT Semid, struct sembuf semoparray [], size_t NOPs );
Semoparray is a pointer that points to an array of semaphore operations. NOPs specifies the number of operations in the array.
Next, let's look at a specific example. It creates a keyword for a specific IPC structure and a semaphore, creates an index for this semaphore, and modifies the semaphore value pointed to by the index, finally, we clear the semaphore. In the following code, the ftok function generates the unique IPC keyword we mentioned above.

# Include <stdio. h>
# Include <sys/types. h>
# Include <sys/SEM. h>
# Include <sys/IPC. h>
Void main (){
Key_t unique_key;/* defines an IPC keyword */
Int ID;
Struct sembuf lock_it;
Union semun options;
Int I;

Unique_key = ftok (".", 'A');/* generate a keyword. The character 'a' is a random seed */
/* Create a new semaphore Set */
Id = semget (unique_key, 1, ipc_creat | ipc_excl | 0666 );
Printf ("semaphore id = % d \ n", ID );
Options. Val = 1;/* set the variable value */
Semctl (ID, 0, setval, options);/* set the semaphore of index 0 */

/* Print the semaphore value */
I = semctl (ID, 0, getval, 0 );
Printf ("value of semaphore at index 0 is % d \ n", I );

/* Reset the semaphore below */
Lock_it.sem_num = 0;/* specifies the semaphore */
Lock_it.sem_op =-1;/* define operation */
Lock_it.sem_flg = ipc_nowait;/* Operation Method */
If (semop (ID, & lock_it, 1) =-1 ){
Printf ("can not lock semaphore. \ n ");
Exit (1 );
}

I = semctl (ID, 0, getval, 0 );
Printf ("value of semaphore at index 0 is % d \ n", I );

/* Clear semaphores */
Semctl (ID, 0, ipc_rmid, 0 );
}

2.3.5 sets of interfaces
Socket programming is one of the main ways to implement inter-process communication between Linux and most other operating systems. The well-known WWW Service, FTP service, and Telnet service are implemented based on a set of interface programming. In addition to remote computer processes, the set of interfaces also apply to inter-process communication within the same local computer. The classic textbook on the set of interfaces is also "UNIX Network Programming: networked APIs and sockets" compiled by Richard Stevens. Tsinghua University Press published the photocopy of this book. It is also one of the essential books for Linux programmers.
For details about this part, refer to the author's article "design your own network ant", which describes and sample programs using several commonly used interface functions. This part may be the most important and attractive part of Linux inter-process communication programming. After all, the Internet is developing at an incredible speed, if a programmer does not consider the network or the Internet when designing and writing his next program, it can be said that his design is difficult to succeed.

3 Comparison between Linux processes and Win32 processes/threads
Anyone familiar with Win32 programming must know that the Win32 process management method is very different from that on Linux. in UNIX, there is only a process concept, however, there is a "Thread" concept in Win32. What is the difference between Linux and Win32?
The process/thread in Win32 inherits from OS/2. In Win32, "process" refers to a program, and "Thread" refers to an execution "clue" in "process ". At the core, the multi-process of Win32 is not much different from that of Linux. The thread in Win32 is equivalent to a Linux Process and is actually executing code. However, in Win32, threads in the same process share data segments. This is the biggest difference with Linux processes.
The following section shows how the next Win32 process starts a thread.

Int g;
DWORD winapi childprocess (lpvoid lpparameter ){
Int I;
For (I = 1; I <1000; I ++ ){
G ++;
Printf ("this is child thread: % d \ n", G );
}
Exitthread (0 );
};

Void main ()
{
Int threadid;
Int I;
G = 0;
Createthread (null, 0, childprocess, null, 0, & threadid );
For (I = 1; I <1000; I ++ ){
G ++;
Printf ("this is parent thread: % d \ n", G );
}
}

In Win32, The createthread function is used to create a thread. Unlike the creation process in Linux, the Win32 thread does not start to run from the creation process, but is specified by createthread, the thread starts to run from that function. This program is the same as the previous UNIX program, with 1000 pieces of information printed by each of the two threads. Threadid is the thread number of the subthread. In addition, the global variable G is shared between the subthread and the parent thread. This is the biggest difference with Linux. As you can see, Win32 processes/threads are more complex than Linux. in Linux, it is not difficult to implement threads similar to Win32. As long as the fork is passed, the sub-process can call the threadproc function, in addition, you can set up a shared data zone for global variables. However, fork-like functions cannot be implemented in Win32. Therefore, although the library functions provided by the C language compiler under Win32 are compatible with most Linux/Unix library functions, fork cannot be implemented yet.
For multi-task systems, sharing the data zone is necessary, but it is also a problem that is easy to cause confusion. in Win32, a programmer can easily forget that the data between threads is shared, after a thread modifies a variable, the other thread modifies it again, causing a program issue. However, in Linux, because variables are not shared, programmers explicitly specify the data to be shared, making the program clearer and safer.
As for the Win32 "process" concept, it means "application", which is equivalent to exec in UNIX.
Linux also has its own multi-threaded function pthread, which is different from Linux processes and Win32 processes, the introduction to pthread and How to Write multi-threaded programs in Linux will be described in another article titled multi-threaded programming in Linux.

4. Thanks
For more information, see "multi-process programming in Linux" in www.lisoleg.org. The author is Yu Lei.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.