Multi-threading and multi-process understanding

Source: Internet
Author: User

Reference https://www.liaoxuefeng.com/

A thread is the smallest execution unit, and a process consists of at least one thread. How to schedule processes and threads is entirely up to the operating system, and the program itself cannot decide when to execute and how long it takes to execute. Multi-process and multi-threaded programs involve synchronization, data sharing problems, and are more complex to write. In Unix/Under Linux, you can use the fork () call to implement multiple processes. To achieve multi-process across platforms, you can use the multiprocessing module. Interprocess communication is the biggest difference between multithreading and multi-processes through queue, pipes, etc., in many processes, the same variable, each of which has a copy in each process, does not affect each other, and in many threads, all variables are shared by all threads, so any one variable can be modified by any thread , so the biggest danger of sharing data between threads is that multiple threads change a variable at the same time, and the content is scrambled. Because the python thread is a real thread, but the interpreter executes the code, there is a Gil Lock: Global interpreter lock, before any Python thread executes, must first obtain the Gil Lock, and then, each execution of 100 bytecode, the interpreter will automatically release the Gil lock , giving other threads a chance to execute. This Gil global lock actually locks the execution code of all threads, so multithreading can only be performed alternately in Python, even if 100 threads run on a 100-core CPU, only 1 cores are used. However, there is no need to worry too much that Python cannot use multithreading to achieve multicore tasks, but it can achieve multi-core tasks through multiple processes. Multiple Python processes have separate Gil locks that do not affect each other. Summary multithreading programming, model complex, prone to conflict, must be isolated with locks, but also beware of the occurrence of deadlocks. The Python interpreter was designed with a Gil global lock, resulting in multi-threaded inability to take advantage of multicore. Multi-threaded concurrency is a beautiful dream in Python. In a multithreaded environment, each thread has its own data. It is better for a thread to use its own local variables than to use global variables, because local variables can only be seen by the thread themselves and not affect other threads, and changes to global variables must be locked. Although a threadlocal variable is a global variable, each thread can only read and write independent copies of its own thread, without interfering with each other. Threadlocal solves the problem that parameters are passed between functions in a thread. Compute-intensive vs. IO-intensive the second consideration for multitasking is the type of task. We can divide the task into compute-intensive and IO-intensive. Compute-intensive tasks are characterized by a large number of computations that consume CPU resources, such as PI, HD decoding video, and so on, all relying on the computing power of the CPU. This computationally intensive task can be accomplished with multitasking, but the more tasks, the more time it takes to switch tasks, the less efficient the CPU is to perform the task, so the most efficient use of the CPU should be equal to the number of cores in the CPU. Compute-intensive tasks are critical to the efficiency of your code because they consume CPU resources primarily. A scripting language like PythonRuns inefficiently and is completely unsuitable for compute-intensive tasks. For computationally intensive tasks, it is best to write in C. The second type of task is IO-intensive, the tasks involved in network, disk IO are IO-intensive tasks, which are characterized by low CPU consumption and most of the time the task is waiting for the IO operation to complete (because IO is much slower than CPU and memory speed). For IO-intensive tasks, the more tasks you have, the higher the CPU efficiency, but there is a limit. Most of the tasks that are common are IO-intensive tasks, such as Web applications. During IO-intensive task execution, About%time is spent on Io, and there is little time spent on the CPU, so it is completely impossible to improve the running efficiency by replacing the very low-speed scripting language with Python in the very fast-running C language. For IO-intensive tasks, the most appropriate language is the most efficient (least code) language, the scripting language is preferred, and the C language is the worst. During asynchronous Ioio intensive task execution, About%time is spent on Io, and there is little time spent on the CPU, so it is completely impossible to improve the running efficiency by replacing the very low-speed scripting language with Python in the very fast-running C language. For IO-intensive tasks, the most appropriate language is the most efficient (least code) language, the scripting language is preferred, and the C language is the worst. corresponding to the Python language, the single-threaded asynchronous programming model is called the co-process, and with the support of the coprocessor, an efficient multitasking program can be written based on event-driven. We'll discuss how to write the process later. Distributed processes in thread and process, should preferably process because the process is more stable, and process can be distributed across multiple machines, And the thread can only be distributed to multiple CPUs of the same machine. Summary Python's distributed process interfaces are simple, well-packaged, and suitable for environments where heavy tasks need to be distributed across multiple machines. Note that queue is used to transfer tasks and receive results, and the amount of descriptive data for each task should be as small as possible. For example, send a task to process log files, do not send hundreds of megabytes of log file itself, but send the full path of log file storage, the worker process to share the disk to read the file

Multi-threading and multi-process understanding

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.