The idea of parallel programming: divide and conquer, there are two kinds of models
1.MapReduce: Divide tasks into multiple subtasks that can be parallelized, and merge results after each subtask is completed
Example: Count the number of different shapes.
Map to a number of sub-tasks, the number of statistics, and then use reduce to summarize.
2. pipelining: Divides tasks into serial subtasks, with each subtask parallel. Productconsume
Example:
Multiple producers are parallel and multiple consumers are parallel. Producers produce things in the queue; when there is something in the queue, consumers can consume, so that the two sides do not have much dependency.
Why parallel programming?
multicore, cloud computing makes it easier to meet the conditions for parallel programming.
Big data (resulting in more data), machine learning (complex), high concurrency, makes parallel programming necessary.
Why is it seldom used?
Task segmentation, access to shared data, deadlocks, mutexes, semaphores, use of pipelines, queue communication. Thread, the management of the process.
These concepts make the implementation of parallel programming look difficult
How to learn parallel programming?
Library: Threading, Implementing multithreading
multiprocess, enabling multi-process
Parallepython, realize distributed computing while solving CPU and network resource constrained problems.
Celery+rabbitmq/redis, distributed task Queue Django and it can be implemented with asynchronous task queues
Gevent, which enables efficient asynchronous IO,
2. Processes and Threads
The CPU can only dispatch one process at a time, the memory is independent between processes, and the process thread shares memory.
The main problems we solve are:
Inter-process communication issues;
Inter-thread synchronization issues
Example: Calculate 10000000000 to 0, then calculate with multi-process and multithreading to see how long they take
#-*-coding:utf-8-*-#CopyRight by HeibankeImport Time fromThreadingImportThread fromMultiprocessingImportProcessdefcountdown (N): whilen >0:n-= 1COUNT= 100000000#100 milliondefthread_process_job (n, thread_process, job):"""N: Multi-threaded or multi-process number Thread_process:thread/process class Job:countdown task"""Local_time=time.time ()#instantiate multi-threaded or multi-processThreads_or_processes = [Thread_process (target=job,args= (count//n,)) forIinchXrange (N)]#Learn this writing, very tall, put different classes in the list inside #three thread_process objects saved in threads_or_processes forTinchThreads_or_processes:t.start ()#To start a thread or process, you must call forTinchThreads_or_processes:t.join ()#wait until the thread or process finishes #join works by blocking the process until all the threads have finished executing before the statement can be executed PrintN,thread_process.__name__,"Run job need", Time.time ()-Local_timeif __name__=="__main__": Print "Multi Threads" forIinch[1,2,4]: Thread_process_job (I,thread, Countdown)Print "Multi Process" forIinch[1,2,4]: Thread_process_job (i,process, Countdown)
Output Result:
From the results, when multi-threading, with the increase of threads, time is more, and many processes increase with the process, time becomes less. The reason is the Gil mechanism of Python
GIL
When there are multiple threads, it is not really parallel running, there is actually a lock, who applies to who runs
In Python's original interpreter CPython there is the Gil (Global interpreter lock, globe interpreter Lock), so when interpreting the execution of Python code, a mutex is created that restricts the thread's access to the shared resource until the interpreter encounters i/ O The Gil is released only if the operation or operation has reached a certain number of times.
So, although the CPython line libraries directly encapsulates the native thread of the system, the CPython as a whole is a process, and at the same time only one thread that gets the Gil is running, while the other threads are waiting. This results in even multi-core CPUs, multithreading is just a time-sharing switch.
So it's better suited for I/O intensive tasks and not for CPU-intensive tasks.
However, the advent of muiltprocessing has made it possible for multi-process Python code writing to be simplified to a level similar to multithreading. (Link: https://www.zhihu.com/question/23474039/answer/35418893)
This is two threads running, not parallel, but serial, the red line represents the CPU in the request
Four threads are running
The reason that the process can be fast, and the thread is slow, is that my computer has multiple cores, the process can be parallel, and the thread is in Python or serial, and it takes time to apply for the CPU.
Python Parallel programming