The difference between multi-process and multi-threading (reprint)

Source: Internet
Author: User
Tags message queue posix semaphore strtok

Reprint Address: http://blog.csdn.net/hairetz/article/details/4281931/

Why do I need multiple processes (or multithreading) and why do I need concurrency?

The problem may not be a problem in itself. But for friends who are not exposed to too much process programming, they really can't feel the charm and necessity of concurrency.

I think, as long as you do not write that type of int main () in the end of the code of the person, then more or less you will encounter the code response is not enough, you should have tasted concurrent programming

Sweetness. Like a fast-food waiter, both at the front desk to receive customers ordering food, but also to pick up the phone delivery, no ubiquitous will certainly be busy with your head. Fortunately, indeed,

There is such a technology, so that you can be like the Monkey King, the soul out of the body, happy to easily deal with all the situation, this is the multi-process/threading technology.

Concurrency is a technique that allows you to perform multiple tasks simultaneously at the same time. Your code will be executed not just from top to bottom, but from left to right in a straight line. You can have a line in the main function to communicate with your customers, the other line, you have already sent your takeaway to other customers in the hand.

So, why do we need concurrency? Because we need more powerful features to provide more services, so concurrency is essential.

Two. Multi-process

What is a process. The most intuitive is a PID, the official argument is: The process is a program on the computer to perform an activity.

To put it simple, the following code executes

1 int Main () 2 3 {45 is%d/n ", Getpid ()); 6 7 return 0 ; 8 9 }

Enter the main function, this is a process, the process PID will print out, and then run to return, the function exits, and since the function is the only one execution of the process, so return, the process will also exit.

Look at the multi-process. The call to create a child process under Linux is fork ();

1#include <unistd.h>2#include <sys/types.h>3#include <stdio.h>4 5  6 7 voidprint_exit ()8 {9printf"The exit pid:%d/n", Getpid ());Ten } One  A Main () - {  - pid_t pid; theAtexit (Print_exit);//registers the callback function when the process exits -Pid=fork (); -         if(PID <0)  -printf"Error in fork!");  +         Else if(PID = =0)  -printf"I am the child process and my process ID is%d/n", Getpid ());  +         Else  A         { atprintf"I am the parent process, my process ID is%d/n", Getpid ());  -Sleep2); - wait (); -        } -  -}

I am the child process and my process ID is 15806
The Exit pid:15806
I am the parent process, my process ID is 15805
The Exit pid:15805

This is the result of the GCC test run.

With regard to the fork function, the function is to produce a child process, as previously stated, the process is the process activity executed.

The result of a sub-process fork is that it returns 2 times, returns 0 at a time, and executes the following code sequentially. This is a child process.

Once the PID of the child process is returned, the following code is executed sequentially, which is the parent process.

(Why does the parent process need to get the PID of the child process?) This has a number of reasons, one of the reasons: look at the last wait, we know that the parent process waits for the end of the child process, to deal with its task_struct structure, otherwise it will produce a zombie process, pull away, interested in Google).

If the fork fails, it returns-1.

Additional atexit (print_exit); The required parameter must be the calling address of the function.

Is the print_exit here a function name or a function pointer? The answer is a function pointer, and the function name is always just a string of useless strings.

Rules on a book: function names are equivalent to function pointers when used for non-function calls.

Speaking of a child process is just an extra process, what is his connection and distinction to the parent process?

I would like to suggest that you look at the annotations of the Linux kernel (it is interesting to see that there is a fundamental understanding), in short, after the fork, the child process replicates the task_struct structure of the parent process and assigns a physical page to the child process's stack. In theory, a child process should replicate the heap, stack, and data space of the parent process in its entirety, but 2 share the body segment.

About copy-on-write: Since the General Fork is followed by exec, so, the fork is now in use to copy the technology, as implies, that is, the data segment, heap, stack, the beginning is not copied, by the parent, the child process is shared, and the memory is set to read-only. The kernel is not a copy of the memory that needs to be modified until the parent, the child process, attempts to write these areas. Doing so can improve the efficiency of the fork.

Three. Multithreading

A thread is an distributable unit of executable code. This name is derived from the concept of "threads of execution". In a multi-tasking, thread-based environment, all processes have at least one thread, but they can have multiple tasks. This means that a single program can execute two or more tasks concurrently.

In short, a thread is a process that divides a number of slices, each of which can be a separate process. This is clearly different from multi-process, the process is a copy of the process, and the thread just cut a river into a lot of streams. It does not copy these additional costs, but just the existing river, is the multithreading technology almost no cost to turn into a lot of small processes, its greatness lies in its little less system overhead. (Of course, the great behind again caused the re-entry of various problems, this later slowly compared).

Let's look at the multi-threaded system calls that Linux provides:

int pthread_create (pthread_t *restrict TIDP,                   const pthread_attr_t *restrict attr,                   void * (*START_RTN) (void),                    void *restrict arg); returns:0 if OK, error number on failure

The first parameter is a pointer to the thread identifier.
The second parameter is used to set the thread properties.
The third parameter is the starting address of the thread's running function.
The last parameter is the parameter that runs the function.

#include <stdio.h>#include<string.h>#include<stdlib.h>#include<unistd.h>#include<pthread.h>void* TASK1 (void*);void* TASK2 (void*);voidusr ();intp1,p2;intmain () {usr ();    GetChar (); return 1;} voidusr () {pthread_t pid1, Pid2;       pthread_attr_t attr; void*p; intret=0; Pthread_attr_init (&AMP;ATTR);//Initialize thread property structurePthread_attr_setdetachstate (&attr, pthread_create_detached);//set the attr structure as detachedPthread_create (&pid1, &attr, Task1, NULL);//create thread, return thread number to PID1, thread property set to attr property, thread function entry to Task1, parameter nullPthread_attr_setdetachstate (&attr, pthread_create_joinable); Pthread_create (&pid2, &attr, Task2, NULL); //Front desk Workret=pthread_join (Pid2, &p);//wait for Pid2 to return, return value assigned to Pprintf"After pthread2:ret=%d,p=%d/n", RET, (int) (p); }void* TASK1 (void*arg1) {printf ("task1/n"); //hard and unpredictable work, set as a separate thread, let it rotPthread_exit ((void*)1);}void* TASK2 (void*arg2) {    intI=0; printf ("thread2 begin./n"); //continue to deliver the takeout jobPthread_exit ((void*)2);}

This multithreaded example should be very clear, the main thread to do its own thing, generate 2 sub-threads, Task1 for separation, let its own, and task2 or continue to send takeout, need to wait to return. (Because you remember the zombie process before, threads also need to wait.) If you do not want to wait, set the thread to detach the thread)

In addition, under Linux to compile the use of thread code, be sure to remember to call the Pthread library. Compile as follows:

Gcc-o Pthrea-pthread PTHREA.C

Four. Comparisons and considerations

1. After reading the front, you should have a visual understanding of multi-process and multithreading. If you summarize the differences between multi-process and multi-threading, you can certainly say that the former costs a lot, the latter is less expensive. Indeed, this is the most basic difference.

2. reentrant functionality of thread functions:

When it comes to the reentrant function, and thread safety, I'm lazy, quoting some of the summaries on the web.

Thread Safety: The concept is more intuitive. Generally speaking, a function is called thread-safe, and it always produces the correct result when it is called repeatedly by multiple concurrent threads.

Reentrant: There is basically no formal complete explanation for the concept, but it is more stringent than thread-safe requirements. According to experience, the so-called "re-entry", the common case is that the program executes to a function foo (), received a signal, so pause the currently executing function, go to the signal processing function, and this signal processing function in the execution process, it will also go to just execute the function foo (), So the so-called re-entry occurs. If foo () is able to run correctly, and when the processing is done, the previously paused foo () will be able to run correctly, which means it is reentrant.

Thread-Safe conditions:

To ensure functional thread safety, it is important to consider shared variables between threads. Different threads that belong to the same process share the global extents and heaps in the process memory space, while the private thread space mainly includes stacks and registers. Therefore, for different threads of the same process, each thread's local variables are private, while global variables, local static variables, and variables allocated to the heap are shared. When accessing these shared variables, if you want to ensure thread safety, you must pass the lock method.

Re-entry judgment conditions:

To ensure that the function can be re-entered, you need to meet a few conditions:

1. Do not use static or global data inside the function
2. Do not return static or global data, all data is provided by the caller of the function.
3. Use local data, or protect global data by making local copies of global data.
4, do not call the non-reentrant function.

Reentrant is not the same as thread safety, in general, a reentrant function must be thread-safe, but it may not necessarily be true. Their relationships can be expressed as:

For example: Thestrtok function is neither reentrant nor thread-safe; the lock-in Strtok is not reentrant, but thread-safe, and Strtok_r is both reentrant and thread-safe.

If our thread function is not thread-safe, then in the case of multi-threaded invocation, the possible consequences are obvious--the value of the shared variable may have unpredictable changes due to the access of different threads, resulting in program errors or even crashes.

3. About IPC (interprocess communication)

The communication is unavoidable because of the concurrent coordination of the multi-process and the synchronization between processes.

Just a little bit about Linux's common IPC.

Introduction to several main means of interprocess communication under Linux:

  1. Pipe and well-known pipe (named pipe): Pipelines can be used for communication between affinity processes, and well-known pipelines overcome the limitations of pipe without name, so that, in addition to having the functions of a pipeline, it allows communication between unrelated processes;
  2. Signal (Signal): signal is a more complex mode of communication, used to inform the receiving process of an event occurred, in addition to inter-process communication, the process can also send signals to the process itself; Linux in addition to supporting early UNIX signal semantic function Sigal, Also support the semantics of the POSIX.1 standard signal function sigaction (in fact, the function is based on BSD, BSD in order to achieve a reliable signal mechanism, but also able to unify the external interface, with sigaction function to re-implement the signal function);
  3. Message queue (Message Queuing): Messages queue is a linked table of messages, including POSIX Message Queuing system V Message Queuing. A process with sufficient permissions can add messages to the queue, and a process that is given Read permission can read the messages in the queue. Message queue overcomes the disadvantage that the signal carrying information is low, the pipeline can only carry the unformatted byte stream and the buffer size is limited.
  4. Shared memory: Allows multiple processes to access the same piece of memory space and is the fastest available IPC form. is designed for inefficient operation of other communication mechanisms. It is often used in conjunction with other communication mechanisms, such as semaphores, to achieve synchronization and mutual exclusion between processes.
  5. Semaphore (semaphore): primarily as a means of synchronization between processes and between different threads of the same process.
  6. Socket: A more general inter-process communication mechanism that can be used for inter-process communication between different machines. Originally developed by the BSD branch of the UNIX system, it can now be ported to other Unix-like systems: both Linux and System V variants support sockets.

Perhaps you will have doubts, that multi-threaded communication between, what should be done? As already mentioned, most of the multithreading is under the same process, they share the global variables of the process, we can use global variables to achieve inter-thread communication. If the communication is between 2 threads under a different process, refer directly to interprocess communication.

4. About the stack of threads

Say a thread of your own stack problem.

Yes, after the child thread is generated, it gets a portion of the process's stack space as its nominal independent private space. (Why is it in nominal terms?) Because these threads belong to the same process, other threads can freely access data variables on your nominally private space as long as they get pointers to some of the data on your private stack. (Note: While multi-process is not possible, because different processes, the same virtual address, the basic can not be mapped to the same physical address)

5. Thread fork in sub-line

Several times, it has been asked, why is calling system or fork in the sub-thread function wrong, or is the child process generated by fork fully replicating the parent process?

I have tested that as long as your thread function satisfies the preceding requirements, it is normal.

1#include <stdio.h>2#include <string.h>3#include <stdlib.h>4#include <unistd.h>5#include <pthread.h>6                                                                                                 7 void* TASK1 (void*arg1)8 {9printf"task1/n");TenSystem"ls"); OnePthread_exit ((void*)1); A } -                                                                                                  - intMain () the { -   intret=0; -   void*p; -    intp1=0; + pthread_t pid1; -Pthread_create (&pid1, NULL, TASK1, NULL); +Ret=pthread_join (PID1, &p); Aprintf"End main/n"); at     return 1; -}

The above code will be able to invoke the LS instruction normally.

However, in the case of simultaneous calls to multiple processes (also called thread functions in a child process) and multithreading, the function body is likely to deadlock.

Specific examples can be seen in this article.

Http://www.cppblog.com/lymons/archive/2008/06/01/51836.aspx

The difference between multi-process and multi-threading (reprint)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.