Multi-process programming in Linux

Source: Internet
Author: User
Tags random seed

1 Introduction
For those who have never been familiar with Unix/Linux operating systems, fork is one of the most difficult concepts to understand: it executes once but returns two values. The fork function is one of the most outstanding achievements of Unix systems. It was one of the achievements made by developers in the early 1970s s after a long and painstaking theoretical and practical exploration, it minimizes the cost of process management by the operating system. On the other hand, it provides programmers with a simple and clear multi-process method. Unlike dos and earlier Windows systems, Unix/Linux systems are truly multi-task operating systems. It can be said that programming in a Linux environment cannot be considered true without multi-process programming.
The concept of multi-threaded programming was proposed as early as 1960s, but it was not until the middle of 1980s that the multi-threaded mechanism was introduced in Unix systems. Nowadays, due to its many advantages, multi-threaded programming has been widely used.
Next, we will introduce some preliminary knowledge about writing multi-process and multi-threaded programs in Linux.
2 multi-process Programming
What is a process? The concept of process is for the system rather than for the user. For the user, the concept of process is for the program. When you press a command to execute a program, the system starts a process. However, unlike programs, in this process, the system may need to start one or more processes to complete multiple independent tasks. The main content of multi-process programming includes Process Control and inter-process communication. Before learning about this, we must first know the structure of the process.
2.1 Structure of processes in Linux
The next process in Linux has three parts of data in the memory: "code segment", "Stack segment", and "data segment ". In fact, people who have learned assembly language know that the general CPU has the above three register segments to facilitate the operation of the operating system. These three parts are also necessary to form a complete execution sequence.
"Code segment", as its name implies, stores the data of program code. If several processes on the machine run the same program, they can use the same code segment. The "Stack segment" stores the return address of the subroutine, the parameters of the subroutine, and the local variables of the program. The data segment stores the global variables, constants, and dynamic data space allocated by the Program (for example, space obtained using functions such as malloc ). There are many details here, so we will not discuss them much here. If the system runs several identical programs at the same time, the same stack segment and data segment cannot be used between them.
2.2 Process Control in Linux
In the traditional Unix environment, there are two basic operations used to create and modify processes: the fork () function is used to create a new process, which is almost a full copy of the current process; the Exec () function family is used to start another process to replace the currently running process. The Process Control in Linux is basically the same as that in traditional UNIX processes, but there are some differences in some details. For example, in Linux systems, calling vfork and fork is exactly the same, in some versions of Unix systems, vfork calls have different functions. Since these differences hardly affect most of our programming, we will not consider them here.
2.2.1 fork ()
Fork means "fork" in English. Why is this name used? Because a process is running, if fork is used, another process is generated, so the process is "Forked", so this name is very good. The following describes how to use fork. This program demonstrates the basic framework of using fork:
Void main (){
Int I;
If (Fork () = 0 ){
/* Sub-process program */
For (I = 1; I ");
Fgets (command, 256, stdin );
Command [strlen (command)-1] = 0;
If (Fork () = 0 ){
/* The sub-process executes this command */
Execlp (command, command );
/* If the exec function returns, the command is not executed normally and the error message is printed */
Perror (command );
Exit (errorno );
}
Else {
/* Parent process. Wait until the child process ends and print the return value of the child process */
Wait (& RTN );
Printf ("child process return % d \ n",. RTN );
}
}
}
This program reads and executes commands from the terminal. After the execution is complete, the parent process continues to wait for the command to be read from the terminal. If you are familiar with DOS and Windows system calls, you must know that DOS/Windows also has exec functions. The usage is similar, but dos/Windows also has spawn functions, because DOS is a single-task system, it can only "parent process" resident in the machine and then execute "sub-process", which is a function of the spawn class. Win32 is already a multi-task system, but it also retains the spawn class functions. The methods for implementing the spawn function in Win32 are similar to those in the preceding UNIX, after a sub-process is opened, the parent process continues to run after the sub-process ends. UNIX is a multi-task system at the beginning, so the spawn class functions are not required from the core point of view.
In this section, we will also talk about the system () and popen () functions. The system () function calls fork () first, and then exec () is called to execute the user's logon shell. It searches for executable file commands and analyzes parameters, finally, it uses one of the wait () function families to wait for the completion of sub-processes. The function popen () is similar to the function system (). The difference is that it calls the pipe () function to create a pipeline to complete the standard input and standard output of the program. These two functions are designed for less diligent programmers and have considerable defects in efficiency and security. They should be avoided whenever possible.
2.3 inter-process communication in Linux
It is absolutely impossible to describe inter-process communication in detail here, and it is difficult for the author to confidently say what kind of knowledge he has achieved on this part of content, therefore, at the beginning of this section, I would like to recommend the famous works of Richard Stevens: Advanced Programming in the Unix environment. its Chinese translation "Advanced Programming in UNIX environment" has been published by the Mechanical Industry Publishing House. The original Article is brilliant. The translation is also authentic. If you are really interested in programming in Linux, so hurry up and place the book next to your desk or computer. It is really hard to suppress the admiration in our hearts. Let's get down to the truth. In this section, we will introduce some of the most preliminary and simplest knowledge and concepts of inter-process communication.
First, inter-process communication can be achieved at least by sending open files. Different processes transmit information through one or more files. In fact, in many application systems, this method is used. However, generally, inter-process communication (IPC: Interprocess Communication) does not include this seemingly low-level communication method. There are many methods to implement inter-process communication in Unix systems, and unfortunately, very few methods can be transplanted in all UNIX systems (the only one is a half-duplex pipeline, this is also the most primitive communication method ). Linux, as a new operating system, supports almost all common inter-process communication methods in UNIX: pipelines, message queues, shared memory, semaphores, and interfaces. Next we will introduce them one by one.
2.3.1 MPs queue
A pipe is the oldest way of inter-process communication. It includes an unknown pipe and a famous pipe. The former is used for communication between parent and child processes, the latter is used for communication between any two processes running on the same machine.
The unknown pipeline is created by the pipe () function:
# Include
Int pipe (INT filedis [2]);
The filedis parameter returns two file descriptors: filedes [0] is read and filedes [1] is write. Output of filedes [1] is the input of filedes [0. The following example demonstrates how to implement communication between the parent process and the child process.
# Define input 0
# Define output 1
Void main (){
Int file_descriptors [2];
/* Define the sub-process Number */
Pid_t PID;
Char Buf [256];
Int returned_count;
/* Create an unknown MPs queue */
Pipe (file_descriptors );
/* Create a sub-process */
If (pid = fork () =-1 ){
Printf ("error in fork \ n ");
Exit (1 );
}
/* Execute the sub-process */
If (pid = 0 ){
Printf ("in the spawned (child) process... \ n ");
/* The sub-process writes data to the parent process and closes the read end of the MPs queue */
Close (file_descriptors [input]);
Write (file_descriptors [Output], "Test Data", strlen ("Test Data "));
Exit (0 );
} Else {
/* Execute the parent process */
Printf ("in the spawning (parent) process... \ n ");
/* The parent process reads data written by the sub-process from the MPs queue and closes the data written by the MPs queue */
Close (file_descriptors [Output]);
Returned_count = read (file_descriptors [input], Buf, sizeof (BUF ));
Printf ("% d bytes of data converted ed from spawned process: % s \ n ",
Returned_count, Buf );
}
}
In Linux, famous pipelines can be created in two ways: Command Line mknod System Call and function mkfifo. The following two channels generate a famous pipe named myfifo under the current directory:
Method 1: mkfifo ("myfifo", "RW ");
Method 2: mknod myfifo P
After a famous pipeline is generated, you can use common file I/O functions such as open, close, read, and write to operate it. The following is a simple example. Suppose we have created a famous pipe named myfifo.
/* Process 1: Read a famous Pipeline */
# Include
# Include
Void main (){
File * in_file;
Int COUNT = 1;
Char Buf [80];
In_file = fopen ("mypipe", "R ");
If (in_file = NULL ){
Printf ("error in fdopen. \ n ");
Exit (1 );
}
While (COUNT = fread (BUF, 1, 80, in_file)> 0)
Printf ("received from pipe: % s \ n", Buf );
Fclose (in_file );
}
/* Process 2: Write a famous Pipeline */
# Include
# Include
Void main (){
File * out_file;
Int COUNT = 1;
Char Buf [80];
Out_file = fopen ("mypipe", "W ");
If (out_file = NULL ){
Printf ("error Opening Pipe .");
Exit (1 );
}
Sprintf (BUF, "this is test data for the named pipe example \ n ");
Fwrite (BUF, 1, 80, out_file );
Fclose (out_file );
}
2.3.2 Message Queue
Message Queue is used for communication between processes running on the same machine. It is similar to the pipeline. In fact, it is a communication method that is gradually being eliminated, we can replace it with a Stream pipeline or a set of interfaces. Therefore, we do not want to explain this method, and we recommend that you ignore this method.
2.3.3 shared memory
Shared memory is the fastest way to communicate between processes running on the same machine, because data does not need to be copied between different processes. A shared memory area is usually created by a process, and other processes read and write this memory area. There are two ways to get the shared memory: ing/dev/MEM device and memory image file. The previous method does not bring additional overhead to the system, but is not commonly used in reality, because it controls the access to the actual physical memory. in Linux, this can only be done by limiting the memory used to access the Linux system, which is of course not practical. The common method is to use the shared memory for storage through the shmxxx function family.
The first function to use is shmget, which obtains a shared storage identifier.
# Include
# Include
# Include
Int shmget (key_t key, int size, int flag );
This function is similar to the familiar malloc function. The system allocates the size of memory as the shared memory according to the request. In the Linux kernel, each IPC structure has a non-negative integer identifier, So that you only need to reference the identifier when sending a message to a message queue. This identifier is obtained by the key word of the IPC structure in the kernel. This keyword is the key of the first function above. The data type key_t is defined in the header file SYS/types. H. It is a long integer data. In our subsequent chapters, we will also encounter this keyword.
After the shared memory is created, other processes can call shmat () to connect it to their own address space.
Void * shmat (INT shmid, void * ADDR, int flag );
Shmid is the shared storage identifier returned by the shmget function. The ADDR and flag parameters determine how to determine the connection address. the return value of the function is the actual address connected to the Data Segment of the process, A process can perform read/write operations on the process.
When using shared storage to implement inter-process communication, note the synchronization of data access. Make sure that the desired data has been written when a process reads data. Generally, semaphores are used to synchronize access to shared storage data. In addition, you can use the shmctl function to set some flags of shared storage memory, such as shm_lock and shm_unlock.
2.3.4 semaphores
Semaphores, also known as semaphores, are used to coordinate data objects between different processes. The most important application is the shared memory mode of inter-process communication in the previous section. Essentially, semaphores are a counter used to record access to a resource (such as shared memory. Generally, to obtain shared resources, the process must perform the following operations:
(1) Test the semaphore that controls the resource.
(2) If the semaphore value is positive, the resource can be used. The process minus 1.
(3) If the semaphore is 0, the resource is currently unavailable, and the process enters the sleep state until the signal value is greater than 0. The process is awakened and transferred to step (1 ).
(4) When a process no longer uses a semaphore-controlled resource, the signal value is increased by 1. If a process is sleeping and waiting for this semaphore at this time, it will be awakened.
The Linux Kernel Operating system rather than the user process is used to maintain the semaphore state. We can see the definition of each structure used by the kernel to maintain the semaphore state from the/usr/src/Linux/include/Linux/SEM. h file. Semaphores are a set of data. You can use each element of this set separately. The first function to be called is semget, which is used to obtain a semaphore ID.
# Include
# Include
# Include
Int semget (key_t key, int nsems, int flag );
The key is the keyword of the IPC structure discussed earlier. It will decide whether to create a new semaphore set or reference an existing semaphore set in the future. Nsems is the number of semaphores in the set. If you create a new set (generally on the server), you must specify nsems. If you reference an existing set of semaphores (usually on the client), you can specify nsems as 0.
The semctl function is used to operate on semaphores.
Int semctl (INT Semid, int semnum, int cmd, Union semun Arg );
Different operations are implemented through the CMD parameter. Seven different operations are defined in the header file Sem. H. You can refer to the actual programming.
The semop function automatically executes the operation array on the semaphore set.
Int semop (INT Semid, struct sembuf semoparray [], size_t NOPs );
Semoparray is a pointer that points to an array of semaphore operations. NOPs specifies the number of operations in the array.
Next, let's look at a specific example. It creates a keyword for a specific IPC structure and a semaphore, creates an index for this semaphore, and modifies the semaphore value pointed to by the index, finally, we clear the semaphore. In the following code, the ftok function generates the unique IPC keyword we mentioned above.
# Include
# Include
# Include
# Include
Void main (){
Key_t unique_key;/* defines an IPC keyword */
Int ID;
Struct sembuf lock_it;
Union semun options;
Int I;
Unique_key = ftok (".", 'A');/* generate a keyword. The character 'a' is a random seed */
/* Create a new semaphore Set */
Id = semget (unique_key, 1, ipc_creat | ipc_excl | 0666 );
Printf ("semaphore id = % d \ n", ID );
Options. Val = 1;/* set the variable value */
Semctl (ID, 0, setval, options);/* set the semaphore of index 0 */
/* Print the semaphore value */
I = semctl (ID, 0, getval, 0 );
Printf ("value of semaphore at index 0 is % d \ n", I );
/* Reset the semaphore below */
Lock_it.sem_num = 0;/* specifies the semaphore */
Lock_it.sem_op =-1;/* define operation */
Lock_it.sem_flg = ipc_nowait;/* Operation Method */
If (semop (ID, & lock_it, 1) =-1 ){
Printf ("can not lock semaphore. \ n ");
Exit (1 );
}
I = semctl (ID, 0, getval, 0 );
Printf ("value of semaphore at index 0 is % d \ n", I );
/* Clear semaphores */
Semctl (ID, 0, ipc_rmid, 0 );
}
2.3.5 sets of interfaces
Socket programming is one of the main ways to implement inter-process communication between Linux and most other operating systems. The well-known WWW Service, FTP service, and Telnet service are implemented based on a set of interface programming. In addition to remote computer processes, the set of interfaces also apply to inter-process communication within the same local computer. The classic textbook on the set of interfaces is also "UNIX Network Programming: networked APIs and sockets" compiled by Richard Stevens. Tsinghua University Press published the photocopy of this book. It is also one of the essential books for Linux programmers.
For details about this part, refer to the author's article "design your own network ant", which describes and sample programs using several commonly used interface functions. This part may be the most important and attractive part of Linux inter-process communication programming. After all, the Internet is developing at an incredible speed, if a programmer does not consider the network or the Internet when designing and writing his next program, it can be said that his design is difficult to succeed.
3 Comparison between Linux processes and Win32 processes/threads
Anyone familiar with Win32 programming must know that the Win32 process management method is very different from that on Linux. in UNIX, there is only a process concept, however, there is a "Thread" concept in Win32. What is the difference between Linux and Win32?
The process/thread in Win32 inherits from OS/2. In Win32, "process" refers to a program, and "Thread" refers to an execution "clue" in "process ". At the core, the multi-process of Win32 is not much different from that of Linux. The thread in Win32 is equivalent to a Linux Process and is actually executing code. However, in Win32, threads in the same process share data segments. This is the biggest difference with Linux processes.

 

 

1. Differences between Linux and MS-DOS

It is common to run Linux and MS-DOS on the same system, but there are many differences between them.

In terms of processor functions, the MS-DOS does not fully play the functions of the x86 processor, and Linux runs completely in the processor protection mode, and play all the features of the processor. Linux can directly access all available memory in the computer and provide the complete UNIX interface, while MS-DOS only supports some UNIX interfaces.

In terms of cost of use, Linux and MS-DOS are two completely different entities. Compared with other commercial operating systems, MS-DOS is cheaper, and has a large share in PC users, any other PC operating system is difficult to reach the popularity of MS-DOS, because the cost of other operating systems is not a small burden for most PC users, while Linux is free, users can obtain its version from the Internet or other means, and can be used at will without consideration of cost issues.

MS-DOS is a single-task operating system, and once the user runs an MS-DOS application, it excludes the resources of the system, users cannot run other applications at the same time, while Linux is a multi-task operating system. Users can run multiple applications at the same time.

2. the differences between Linux, OS/2, and windows are based on the development background. The difference between Linux and other operating systems is that Linux is developed from a mature operating system, other operating systems (such as Windows NT and Windows 2000) are self-built and have no corresponding operating systems. This difference allows Linux users to make a huge profit from the contributions of UNIX groups. UNIX is one of the most widely used and developed most mature operating systems in the world today. It is a multi-task system developed in the middle of the 1970s S. Although interfaces are sometimes chaotic, and lack of relatively concentrated standards, but it is still gradually growing to become one of the most widely used operating systems.

Both Unix authors and Unix users believe that only UNIX is a real operating system, and many computer systems (from personal computers to supercomputer) have UNIX versions, unix users can get support and help in many ways. Therefore, as a clone of UNIX, Linux users will also receive corresponding support and help, and Linux will directly have a solid position of UNIX among users.

From the usage perspective, the difference between Linux and other operating systems is that Linux is an open and free operating system, while other operating systems are closed systems and must be paid for use. This difference allows us to get a lot of Linux versions and applications developed for them without spending money. When we access the Internet, we will find that almost all the free software available can run on Linux systems. Different software vendors have different UNIX implementation methods for these software. UNIX developers and developers promote standardization in the form of open systems, but no company controls this design.

Therefore, any software vendor (or pioneer) can implement these standards in a Unix. Operating systems such as OS/2 and windows are copyrighted products, and their interfaces and designs are controlled by a company. Only these companies have the right to implement their designs, they all develop in a closed environment.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.