Concurrent programming instances in Python _python

Source: Internet
Author: User
Tags generator in python

First, Introduction

We call a running program a process. Each process has its own system state, which contains the memory state, the open file list, the program pointer that tracks the execution of the instructions, and a call stack that holds the local variables. Typically, a process executes according to a single sequence of control flows, which is called the main thread of the process. At any given moment, a program does only one thing.

A program can create a new process, such as os.fork () or subprocess, through an OS or subprocess module in a Python library function. Popen ()). However, these processes, known as child processes, run independently, with separate system states and main threads. Because processes are independent of each other, they are executed concurrently with the original process. This means that the original process can perform other work after the child process is created.

Although processes are independent of each other, they can communicate with each other through a mechanism called interprocess communication (IPC). A typical pattern is based on message delivery, which can be simply understood as a byte-only buffer, and the Send () or recv () manipulation primitives can transmit or receive messages through I/O channels such as pipelines (pipe) or network sockets (network socket). There are also some IPC modes that can be done through the memory Mapping (memory-mapped) mechanism (such as the Mmap module), where processes can create shared areas in memory, and modifications to those areas are visible to all processes.

Multiple processes can be used for scenarios where multiple tasks need to be performed simultaneously, with different processes responsible for different parts of the task. Another way to subdivide work into tasks, however, is to use threads. Like a process, a thread has its own control flow as well as the execution stack, but the thread runs within the process that created it, sharing all the data and system resources of its parent process. Threads are useful when applications need to complete concurrent tasks, but the potential problem is that a large number of system states must be shared between tasks.

The operating system is responsible for scheduling when using multiple processes or multiple threads. This is accomplished by giving each process (or thread) a small time slice and a fast loop switch between all active tasks, which divides the CPU time into small fragments for each task. For example, if you have 10 active processes running in your system, the operating system will appropriately allocate one-tenth of the CPU time to each process and cycle through 10 processes. When the system has more than one CPU core, the operating system can dispatch the process to different CPU cores, keeping the system load on average to achieve parallel execution.

Programs written with concurrent execution mechanisms need to consider some complex issues. The primary source of complexity is the issue of synchronizing and sharing data. In general, multiple tasks attempting to update the same data structure at the same time can cause inconsistencies between dirty data and program state (formally speaking is the issue of resource competition). To solve this problem, you need to use mutexes or other similar synchronization primitives to identify and protect key parts of your program. For example, if multiple different threads are trying to write data to the same file at the same time, you need a mutex to execute the writes sequentially, and when a thread is writing, the other thread must wait until the current thread releases the resource.

Concurrent programming in Python

Python has long supported concurrent programming in different ways, including threads, subprocess, and other concurrent implementations that utilize the Generator (generator function).

Python supports both message passing and thread-based concurrent programming mechanisms on most systems. While most programmers are more familiar with threading interfaces, Python's threading mechanism has many limitations. Python uses an internal global interpreter lock (Gil) to keep thread safe, and the Gil allows only one thread to execute. This allows Python programs to run on a single processor even on multi-core systems. The python debate about Gil, though numerous, is not likely to be removed in the foreseeable future.

Python provides some nifty tools for managing concurrent operations based on threads and processes. Even simple programs can use these tools to make the task run concurrently to speed up the operation. The Subprocess module provides an API for the creation and communication of child processes. This is especially good for running text-related programs because these APIs support the transfer of data through the standard input and output channels of the new process. The signal module exposes the semaphore mechanism of the UNIX system to the user to pass event information between processes. Signals are processed asynchronously, and usually when a signal arrives, interrupts the program's current work. The signaling mechanism is capable of implementing coarse-grained messaging systems, but there are other more reliable interprocess communication technologies that can deliver more complex messages. The Threading module provides a series of high-level, object-oriented APIs for concurrent operations. Thread objects run concurrently within a process, sharing memory resources. Using threads enables you to better extend I/o-intensive tasks. The multiprocessing module is similar to the threading module, but it provides an operation for the process. Each process class is a real operating system process and there is no shared memory resource, but the multiprocessing module provides a mechanism for sharing data between processes and delivering messages. In general, it is simple to change a thread-based program to a process-based one, and you just need to modify some import declarations.

Threading Module Example

Take the threading module as an example, think of a simple question: how to use segmented parallelism to complete a large number of cumulative.

Import Threading
 
Class Summingthread (threading. Thread):
  def __init__ (self, Low, high):
    super (Summingthread, self). __init__ ()
    self.low =
    Low Self.high = high
    self.total = 0
 
  def run (self): for
    I in range (Self.low, Self.high):
      self.total = i
 
Thread1 = Summingthread (0, 500000)
thread2 = Summingthread (500000, 1000000)
Thread1.start () # This actually Causes the thread to run
Thread2.start ()
thread1.join () # This waits until the thread has completed
. Join ()
# At this point, both threads have completed result
= thread1.total + thread2.total
print (Result)

Customizing the Threading Class Library

I've written a small Python class library that is easy to use threads, and contains some useful classes and functions.

Key parameters:

* do_threaded_work– The function assigns a series of tasks to the corresponding handler function (the assignment order is indeterminate)

* threadedworker– This class creates a thread that pulls work tasks from a synchronized work queue and writes the processing results to the synchronization result queue

* start_logging_with_thread_info– writes the thread ID to all log messages. (Dependent log environment)

* stop_logging_with_thread_info– is used to remove the thread ID from all log messages. (Dependent log environment)

Import Threading Import Logging Def do_threaded_work (Work_items, Work_func, Num_threads=none, Per_sync_timeout=1, Prese Rve_result_ordering=true): "" "Executes work_func on each work_item.
 
    Note:execution is not preserved, but output ordering is (optionally).
    Parameters:-num_threads Default:len (Work_items)---Number of threads to use process items in Work_items.
    -Per_sync_timeout default:1---each synchronized operation can optionally timeout.
 
    -Preserve_result_ordering default:true---reorders result_item to match original Work_items. Return:---List of results from applying work_func to each work_item.
 
    The order is optionally preserved. Example:def process_url (URL): # Todo:do Some work with the URL return url urls_to_process = ["HT Tp://url1.com "," http://url2.com "," http://site1.com "," http://site2.com "] # process URLs in parallel result_ite ms = Do_threaded_work (URLS_to_process, Process_url) # Print (results) print (repr (result_items)) "" Global wrapped_work_func If not 
  Num_threads:num_threads = Len (work_items) Work_queue = Queue.queue () Result_queue = Queue.queue () index = 0
      For Work_item in Work_items:if preserve_result_ordering:work_queue.put (index, Work_item) Else: Work_queue.put (Work_item) index + + = 1 if Preserve_result_ordering:wrapped_work_func = Lambda Work_item: (work_ Item[0], Work_func (work_item[1]) start_logging_with_thread_info () #spawn a pool of threads, and pass them queue I Nstance for _ in range (num_threads): If preserve_result_ordering:t = Threadedworker (Work_queue, Result_queue , Work_func=wrapped_work_func, queue_timeout=per_sync_timeout) else:t = Threadedworker (Work_queue, Result_queue , Work_func=work_func, Queue_timeout=per_sync_timeout) T.setdaemon (True) T.start () Work_queue.join () Stop_lo Gging_with_thread_info ()
 
  Logging.info (' Work_queue joined ') Result_items = [] while isn't result_queue.empty (): result = result_queue.g ET (timeout=per_sync_timeout) logging.info (' found result[:500]: ' + repr (Result) [:] If Result:result_ite
 
  Ms.append (Result) if Preserve_result_ordering:result_items = [Work_item to index, Work_item in Result_items] Return Result_items class Threadedworker (threading.
 
  Thread): "" "Generic threaded Worker Input to Work_func:item from Work_queue Example queue Urls_to_process = ["http://url1.com", "http://url2.com", "http://site1.com", "http://site2.com"] work_queue = queue.
 
  Queue () Result_queue = Queue.queue () def process_url (URL): # Todo:do Some work with the URL return URL def main (): # Spawn a pool of threads, and pass them queue instance to I in range (3): T = Threadedworker (w
 
  Ork_queue, Result_queue, Work_func=process_url) T.setdaemon (True) T.start ()  # Populate queue with data for URL in Urls_to_process:work_queue.put (URL) # wait on the queue until E
 
  Verything has been processed work_queue.join () # Print results print repr (result_queue) Main () "" " def __init__ (self, work_queue, Result_queue, Work_func, Stop_when_work_queue_empty=true, queue_timeout=1): Threadi Ng. 
    Thread.__init__ (self) self.work_queue = Work_queue Self.result_queue = Result_queue Self.work_func = Work_func Self.stop_when_work_queue_empty = Stop_when_work_queue_empty Self.queue_timeout = queue_timeout def should_co Ntinue_running (self): if Self.stop_when_work_queue_empty:return not Self.work_queue.empty () Else:ret
        Urn True def run (self): while self.should_continue_running (): Try: # grabs item from Work_queue Work_item = Self.work_queue.get (timeout=self.queue_timeout) # works on item Work_result = Self.work_
 
   Func (Work_item)     #place Work_result into Result_queue self.result_queue.put (Work_result, Timeout=self.queue_timeout) E Xcept Queue.Empty:logging.warning (' threadedworker queue is Empty or queue.get () timed out ') except Queue . Full:logging.warning (' Threadedworker Queue was all or queue.put () timed out ') except:logging.ex Ception (' Error in Threadedworker ') finally: #signals to work_queue the item is done Self.work_que Ue.task_done () def start_logging_with_thread_info (): Try:formatter = logging. Formatter (' thread% (thread) -3s]% (message) s ') Logging.getlogger (). Handlers[0].setformatter (Formatter) except:l  Ogging.exception (' Failed to start logging with thread info ') def stop_logging_with_thread_info (): Try:formatter = Logging. Formatter ('% (message) s ') Logging.getlogger (). Handlers[0].setformatter (Formatter) except:logging.exception (' Fail

 Ed to stop logging with thread info ')

Using the sample

From test import Threadedworker the
from queue import queue
 
urls_to_process = ["http://facebook.com", "http:// Pypix.com "]
 
work_queue = queue ()
Result_queue = Queue ()
 
def process_url (URL):
  # Todo:do Some-work with T He URL
 
def main ():
  # Spawn a pool of threads, and pass them queue instance for 
  I in range (5):
    t = threadedworker (Work_queue, Result_queue, Work_func=process_url)
    T.setdaemon (True)
    T.start ()
 
  # populate queue with data for  
  URL in urls_to_process:
    work_queue.put (URL) # wait on the
 
  queue until Everything has been processed   
  work_queue.join ()
 
  # Print Results
  print (repr (result_queue))
 
Main ( )

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.