Concurrent Programming in Python using multithreading and multi-processor

Source: Internet
Author: User

In Python coding, we often discuss how to optimize the performance of simulation execution. Although NumPy, SciPy, and pandas are very useful when considering quantization codeEvent-drivenThese tools cannot be effectively used in the system. Is there any other way to accelerate our code? The answer is yes, but pay attention to it!

In this article, we will look at a different model-Concurrency, We can introduce it into our Python program. This model works very well in simulation and does not need to be shared. The Monte Carlo simulator can be used to simulate various parameters such as Option Pricing and algorithm trading.

We will consider the Threading library and Multiprocessing library in particular.

Python concurrency

When Python beginners explore multi-threaded code for computing-intensive optimization, one of the most asked questions is: "Why is my program slow when I use multiple threads? "

On multi-core machines, we expect multi-threaded code to use additional cores to improve overall performance. Unfortunately, the main Python interpreter CPython) is not really multi-threaded internally, but is handled by a global interpretation lock GIL.

GIL is required because the Python interpreter is NOT thread-safe. This means that there will be a global forced lock when the Python object is securely accessed from the thread. At any time, only one single thread can obtain Python objects or C APIs. The Python command interpreter of every 100 bytes gets the lock again, which is potentially blocked by the I/0 operation. Because of the lock, CPU-intensive code will not improve the performance when using the thread library, but when it uses multiple processing libraries, the performance can be improved.

Parallel library implementation

Now we will use the two libraries mentioned above to optimize the concurrency of a "small" problem.

Thread Library

As mentioned above: Python running the CPython interpreter does not support multi-threaded Multi-core processing. However, Python does have a thread library. So if we may) cannot use multiple cores for processing, what are the benefits of using this library?

Many programs, especially those related to network communication or data input/output (I/O), are often subject to network performance or input/output (I/O) performance constraints. In this way, the Python interpreter waits for function calls to read and write data from "remote" data sources such as network addresses or hard disks. Therefore, such data access is much slower than reading data from the local memory or CPU buffer.

Therefore, if many data sources are accessed in this way, one way to improve the performance of such data access is to generate a thread for each data item to be accessed.

For example, assume that a piece of Python code is used to obtain URLs of many sites. Assuming that the time required for downloading each URL is much longer than the processing time of the computer's CPU, using only one thread will be greatly limited by the input/output (I/O) performance.

By generating a new thread for each download resource, this code will download multiple data sources in parallel and combine the results when all downloads are complete. This means that each subsequent download will not wait until the previous page is downloaded. In this case, this code is restricted by the bandwidth of the Client/Server.

However, many financial-related applications are limited by CPU performance because such applications are highly centralized in processing numbers. Such applications perform Large-Scale Linear Algebra calculation or random numerical statistics, such as Monte Carlo simulation statistics. So as long as you use Python and GIL for such an application, using the Python thread library will not improve the performance.

Python implementation

The following code adds numbers to the list in sequence, which illustrates the implementation of multithreading. Each thread creates a new list and randomly adds some numbers to the list. The selected "Toys" example consumes a lot of CPU.

The following Code outlines the interface of the thread library, but it will not be faster than we can achieve with a single thread. When we use multiple processing libraries for the following code, we can see that it significantly reduces the overall running time.

Let's check how the code works. First, import the threading library. Then we create a function list_append with three parameters. The first parameter count defines the size of the list to be created. The second parameter id is the ID of "work" (used to output debug information to the console. The third parameter out_list is the list of append random numbers.

The _ main _ function creates a 107 size and uses two threads for execution. Then, a jobs list is created to store the separated threads. The threading. Thread object uses the list_append function as a parameter and attaches it to the jobs list.

Finally, jobs start and "joined" respectively ". The join () method blocks the calling thread, such as the main Python interpreter thread) until the thread ends. Before printing the complete information to the console, confirm that all threads have been executed.

 
 
  1. # thread_test.pyimport randomimport threadingdef list_append(count, id, out_list):  
  2.     """  
  3.     Creates an empty list and then appends a   
  4.     random number to the list 'count' number  
  5.     of times. A CPU-heavy operation!  
  6.     """ 
  7.     for i in range(count):  
  8.         out_list.append(random.random())if __name__ == "__main__":  
  9.     size = 10000000   # Number of random numbers to add  
  10.     threads = 2   # Number of threads to create  
  11.  
  12.     # Create a list of jobs and then iterate through  
  13.     # the number of threads appending each thread to  
  14.     # the job list   
  15.     jobs = []  
  16.     for i in range(0, threads):  
  17.         out_list = list()  
  18.         thread = threading.Thread(target=list_append(size, i, out_list))  
  19.         jobs.append(thread)  
  20.  
  21.     # Start the threads (i.e. calculate the random number lists)  
  22.     for j in jobs:  
  23.         j.start()  
  24.  
  25.     # Ensure all of the threads have finished  
  26.     for j in jobs:  
  27.         j.join()  
  28.  
  29.     print "List processing complete." 

We can call the following command time code in the console

 
 
  1. time python thread_test.py 

The following output is generated:

 
 
  1. List processing complete.  
  2. real    0m2.003s 
  3. user    0m1.838s 
  4. sys     0m0.161s 

Note that the sum of user time and sys time is roughly equal to the real time. This indicates that we have not achieved performance improvement using the thread library. We expect a significant decrease in real time. These concepts of concurrent programming are called CPU time and wall-clock time)

Multi-process processing database
 

To fully utilize the multiple cores that all modern processors can provide, we need to use a multi-process processing library. It works in completely different ways from the thread library, but the syntax of the two libraries is very similar.

The multi-process processing database generates multiple operating system processes for each parallel task. By assigning a separate Python interpreter and a separate global interpretation lock (GIL) to each process, the problem caused by a global interpretation lock is cleverly avoided. In addition, each process can occupy a single processor core and reorganize the results when all processes are finished.

However, there are also some defects. Generation of many processes will cause many I/O management problems, because the processing of data by multiple processors will cause data confusion. This will increase the overall running time. However, if the data is restricted within each process, the performance may be greatly improved. Of course, the increase will not exceed the limit set by amdal's law.

Python implementation

To use Multiprocessing, you only need to modify the import line and multiprocessing. Process line. Here, parameters are separately transmitted to the target function. Except for this, the code is almost the same as that implemented using Threading:

 
 
  1. # multiproc_test.pyimport randomimport multiprocessingdef list_append(count, id, out_list):  
  2.     """  
  3.     Creates an empty list and then appends a   
  4.     random number to the list 'count' number  
  5.     of times. A CPU-heavy operation!  
  6.     """ 
  7.     for i in range(count):  
  8.         out_list.append(random.random())if __name__ == "__main__":  
  9.     size = 10000000   # Number of random numbers to add  
  10.     procs = 2   # Number of processes to create  
  11.  
  12.     # Create a list of jobs and then iterate through  
  13.     # the number of processes appending each process to  
  14.     # the job list   
  15.     jobs = []  
  16.     for i in range(0, procs):  
  17.         out_list = list()  
  18.         process = multiprocessing.Process(target=list_append,   
  19.                                           args=(size, i, out_list))  
  20.         jobs.append(process)  
  21.  
  22.     # Start the processes (i.e. calculate the random number lists)        
  23.     for j in jobs:  
  24.         j.start()  
  25.  
  26.     # Ensure all of the processes have finished  
  27.     for j in jobs:  
  28.         j.join()  
  29.  
  30.     print "List processing complete." 

Console Test Run Time:

 
 
  1. time python multiproc_test.py 

The output is as follows:

 
 
  1. List processing complete.  
  2. real    0m1.045s 
  3. user    0m1.824s 
  4. sys     0m0.231s 

In this example, we can see that the user and sys time are basically the same, and the real time is nearly two times lower. This is because we use two processes. Expand to four processes or cut the list length by half. The result is as follows. Assume that your computer is at least quad-core ):

 
 
  1. List processing complete.  
  2. real    0m0.540s 
  3. user    0m1.792s 
  4. sys     0m0.269s 

The speed of using four processes is almost 3.8 times faster. However, be careful when spreading this rule to a larger scope and more complex programs. Data conversion, hardware cacha levels, and other problems can reduce the speed.

In the next article, we will parallelize Event-Driben Basketer to improve its ability to run multi-dimensional parameter optimization.

Related reading:

  • Cholesky Decomposition in Python and NumPy

  • European Vanilla Call-Put Option Pricing with Python

  • Advanced Method in Python and NumPy

  • LU Decomposition in Python and NumPy

  • Options Pricing in Python

  • QR Decomposition with Python and NumPy

  • Quick-Start Python Quantitative Research Environment on Ubuntu 14.04

Parallelising Python with Threading and Multiprocessing

Http://www.oschina.net/translate/parallelising-python-with-threading-and-multiprocessing.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.