Introduction and example application of Python concurrent programming

Source: Internet
Author: User
Tags generator in python

About Python concurrent programming knowledge, this article is basically introduced in place, want to learn Python's friends can refer to.

python concurrency Profile

We call a running program a process. Each process has its own system state, which contains the memory state, the open file list, the program pointer that tracks the execution of the instructions, and a call stack that holds the local variables. Typically, a process executes according to a single sequence of control flows, which is called the main thread of the process. At any given moment, a program does only one thing.

A program can create a new process, such as os.fork () or subprocess, through an OS or subprocess module in a Python library function. Popen ()). However, these processes, known as child processes, run independently, with separate system states and main threads. Because processes are independent of each other, they are executed concurrently with the original process. This means that the original process can perform other work after the child process is created.

Although processes are independent of each other, they can communicate with each other through a mechanism called interprocess communication (IPC). A typical pattern is based on message delivery, which can be simply understood as a byte-only buffer, and the Send () or recv () manipulation primitives can transmit or receive messages through I/O channels such as pipelines (pipe) or network sockets (network socket). There are also some IPC modes that can be done through the memory Mapping (memory-mapped) mechanism (such as the Mmap module), where processes can create shared areas in memory, and modifications to those areas are visible to all processes.

Multiple processes can be used for scenarios where multiple tasks need to be performed simultaneously, with different processes responsible for different parts of the task. Another way to subdivide work into tasks, however, is to use threads. Like a process, a thread has its own control flow as well as the execution stack, but the thread runs within the process that created it, sharing all the data and system resources of its parent process. Threads are useful when applications need to complete concurrent tasks, but the potential problem is that a large number of system states must be shared between tasks.

The operating system is responsible for scheduling when using multiple processes or multiple threads. This is accomplished by giving each process (or thread) a small time slice and a fast loop switch between all active tasks, which divides the CPU time into small fragments for each task. For example, if you have 10 active processes running in your system, the operating system will appropriately allocate one-tenth of the CPU time to each process and cycle through 10 processes. When the system has more than one CPU core, the operating system can dispatch the process to different CPU cores, keeping the system load on average to achieve parallel execution.

Programs written with concurrent execution mechanisms need to consider some complex issues. The primary source of complexity is the issue of synchronizing and sharing data. In general, multiple tasks attempting to update the same data structure at the same time can cause inconsistencies between dirty data and program state (formally speaking is the issue of resource competition). To solve this problem, you need to use mutexes or other similar synchronization primitives to identify and protect key parts of your program. For example, if multiple different threads are trying to write data to the same file at the same time, you need a mutex to execute the writes sequentially, and when a thread is writing, the other thread must wait until the current thread releases the resource.

concurrent programming in Python

Python has long supported concurrent programming in different ways, including threads, subprocess, and other concurrent implementations that utilize the Generator (generator function).

Python supports both message passing and thread-based concurrent programming mechanisms on most systems. While most programmers are more familiar with threading interfaces, Python's threading mechanism has many limitations. Python uses an internal global interpreter lock (Gil) to keep thread safe, and the Gil allows only one thread to execute. This allows Python programs to run on a single processor even on multi-core systems. The python debate about Gil, though numerous, is not likely to be removed in the foreseeable future.

Python provides some nifty tools for managing concurrent operations based on threads and processes. Even simple programs can use these tools to make the task run concurrently to speed up the operation. The Subprocess module provides an API for the creation and communication of child processes. This is especially good for running text-related programs because these APIs support the transfer of data through the standard input and output channels of the new process. The signal module exposes the semaphore mechanism of the UNIX system to the user to pass event information between processes. Signals are processed asynchronously, and usually when a signal arrives, interrupts the program's current work. The signaling mechanism is capable of implementing coarse-grained messaging systems, but there are other more reliable interprocess communication technologies that can deliver more complex messages. The Threading module provides a series of high-level, object-oriented APIs for concurrent operations. Thread objects run concurrently within a process, sharing memory resources. Using threads enables you to better extend I/o-intensive tasks. The multiprocessing module is similar to the threading module, but it provides an operation for the process. Each process class is a real operating system process and there is no shared memory resource, but the multiprocessing module provides a mechanism for sharing data between processes and delivering messages. In general, it is simple to change a thread-based program to a process-based one, and you just need to modify some import declarations.

  Threading Module Example

Take the threading module as an example, think of a simple question: how to use segmented parallelism to complete a large number of cumulative.

Import threading   Class summingthread (threading. Thread):     def __init__ (Self, low, high):          super (summingthread, self). __init__ ()          Self.low = low         self.high = high   
      self.total = 0       def run (self):         for i in range (Self.low, self.high):              self.total += i   thread1  = summingthread (0, 500000) thread2 = summingthread (500000, 1000000) Thread1.start ()  # this actually causes the thread to run Thread2.start () Thread1.join ()   # This waits until the thread has completed Thread2.join () # at this point, both threads  have completed result = thread1.total + thread2.total Print (Result)


Custom Threading Class Library

I've written a small Python class library that is easy to use threads, and contains some useful classes and functions.
Key parameters:

* do_threaded_work? This function assigns a series of tasks given to the corresponding handler function (indeterminate assignment order)

* Threadedworker? The class creates a thread that will be synchronized from a task force Column to pull the work task and write the processing results to the synchronization results queue

* start_logging_with_thread_info? Writes the thread ID to all log messages. (Dependent log environment)

* stop_logging_with_thread_info? is used to remove the thread ID from all log messages. (Dependent log environment)

Import threading import logging   Def do_threaded_work (work_items, work_func, num_ Threads=none, per_sync_timeout=1, preserve_result_ordering=true):      "" "  executes work_func on each work_item. note: execution order is 
not preserved, but output ordering is  (optionally).           parameters:          - num_threads                default: len (work_items)   --- Number of threads to use 
Process items in work_items.         - per_sync_timeout           Default: 1                 --- each synchronized operation can optionally timeout.         - preserve_result_ordering  default: true              --- reorders result_
Item to match original work_items ordering.           Return:           --- list of results from applying work_func to each work_
Item. order is optionally preserved.           example:            def process_url (URL):             #  todo: do some work with the url        & nbSp;    return url           urls_to_ process = ["http://url1.com",  "http://url2.com",  "http://site1.com",  "http://" Site2.com "]           # process urls in  Parallel         result_items = do_threaded_work (urls_to_ Process, process_url)           # print (results)          print (repr (result_items))      ""      global wrapped_work_func     if not num_threads:          num_threads = len (work_items)       work_ Queue = queue.queue ()     result_queue = queue.queue ()        index =&nbsP;0     for work_item in work_items:          if preserve_result_ordering:              work_queue.put ((Index, work_item))         else:              work_queue.put (Work_item)          index += 1       if preserve_result_ Ordering:         wrapped_work_func = lambda work_item:   (Work_item[0], work_func (work_item[1]))       start_logging_with_thread_info ()        #spawn  a pool of threads, and pass them  queue instance      for _ in range (num_threads):      &nbsP;   if preserve_result_ordering:              t = threadedworker (Work_queue, result_queue, work_func=wrapped_work_func ,  queue_timeout=per_sync_timeout)         else:              t = threadedworker (Work_queue, result_ Queue, work_func=work_func, queue_timeout=per_sync_timeout)          t.setdaemon (True)         t.start ()        work_queue.join ()     stop_logging_with_thread_info ()        Logging.info (' work_queue joined ')       result_items = []      while not result_queue.empty ():         result  = resUlt_queue.get (timeout=per_sync_timeout)         logging.info (' found  result[:500]:  '  + repr (Result) [:]         if  result:             result_items.append ( Result)       if preserve_result_ordering:       
  result_items = [work_item for index, work_item in result_items]       return result_items   Class threadedworker (threading. Thread):      "" " generic threaded worker          input to work_func: item from work_queue        Example usage:       import queue       urls_to_ process = ["http://uRl1.com ", " http://url2.com ", " http://site1.com ", " http://site2.com "]      
 work_queue = queue.queue ()     result_queue = queue.queue ()       def process_url (URL):         # todo:  do some work with the url         return  url       def main ():         #  spawn a pool of threads, and pass them queue instance           for i in range (3):              t = threadedworker (work_queue, result_queue,  Work_func=process_url)             t.setdaemon (True )             t.start ()            # populate queue with data             for url in urls_to_process:              work_queue.put (URL)           #  wait on the queue until everything has been processed               work_queue.join ()            # print results          Print repr (result_queue)       main ()      ""        def __init__ (Self, work_queue, result_queue, work_func, stop_when_work_ Queue_empty=true, queue_timeout=1):         threading.
Thread.__init__ (self)         self.work_queue = work_queue         self.result_queue = result_queue          self.work_func = work_func          self.stop_when_work_queue_empty = stop_when_work_queue_empty          self.queue_timeout = queue_timeout       def should_ Continue_running (self):         if self.stop_when_work_queue_empty :             return not self.work_ Queue.empty ()         else:              return true &nbsp     def run (self):         while 
Self.should_continue_running ():             try:                 # grabs  item from work_queue                  work_item = self.work_queue.get (timeout=self.queue_timeout)                    # works on  item                 work _result = self.work_func (work_item)                     #place  work_result into result_queue                  self.result_queue.put (work_result, timeout= Self.queue_timeout)               except  Queue.empty:                 
Logging.warning (' Threadedworker queue was empty or queue.get ()  timed out ')               except queue.full:                  logging.warning (' Threadedworker queue was full or queue.put ()  timed out ')                except:        
         logging.exception (' Error in threadedworker ')    &NBSp;          finally:                   #signals  to work_queue that  Item is done                  self.work_queue.task_done ()   Def start_logging_with_thread_info ():      try:         formatter = logging. Formatter (' [thread % (thread) -3s] % (message) s ')          Logging.getlogger (). Handlers[0].setformatter (Formatter)     except:          logging.exception (' failed to start logging with thread  Info ')   def stop_logging_with_thread_info ():     try:          formatter =  logging. Formatter ('% (message) s ')         logging.getlogger (). Handlers[0]. Setformatter (Formatter)     except:          Logging.exception (' Failed to stop logging with thread info ')


Use the example

From test import threadedworker from queue import queue   Urls_to_process  = ["http://facebook.com",  "http://pypix.com"]   work_queue = queue () result_queue  = queue ()   Def process_url (URL):     # todo: do some  work with the url     return url   Def main ():      # spawn a pool of threads, and pass them queue  instance      for i in range (5):      
   t = threadedworker (Work_queue, result_queue, work_func=process_url)         t.setdaemon (True)        
 t.start ()       # populate queue with data        for url in urls_to_process:         work_queue.put (URL)       # wait on the queue until everything has  been processed          work_queue.join ()        # print results     print (repr (result_queue))   main ()


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.