Tutorial on micro-threading programming with Python generator

Tutorial on micro-threading programming with Python generator _python

Last Update:2017-01-19 Source: Internet

Author: User

Tags generator switches in python

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The micro-threading domain (at least in Python) has always been a special enhancement part of Stackless python. The topic of stackless and the recent changes it has experienced may well be worth opening a column. But the simple truth is that, under "New Stackless," continuation (continuation) is clearly anachronistic, but the thread is still the reason for the project. This is very complicated ...

At first, let's review some of the content first. So, what is a micro-threading? Micro-Threading is basically a process that can run with very little internal resources--and a process that runs in a single instance of the Python interpreter (in the public memory space, and so on). With the micro-threading, we could run tens of thousands of parallel processes on the current medium-performance PC and switch between contexts hundreds of thousands of times per second. Calls to fork () or standard OS thread calls are simply not up to this level! Even threads in the so-called "lightweight" thread library are several orders of magnitude higher than the micro-threading proposed here.

The meaning of the lightweight threading I've described in this column is a little different from the meaning of OS threads. As far as this is concerned, they are not the same as those provided by Stackless. In many ways, lightweight threads are much simpler than most variants; most of the questions about signals, locks, and the like don't exist. The price of simplicity is that I propose a form of "collaborative multithreading"; I think it's not feasible to add preemption to the standard Python framework (at least in the Stackless Python 2.2-no one knows what __future__ will bring).

Lightweight threading in a sense recalls earlier collaborative multitasking with Windows and MacOS versions (but in a single application). In another sense, however, a lightweight thread is just another way of expressing a stream in a program; all the lightweight threads do (at least in principle) can be done with the "really huge If/elif block" technique (the reckless Programmer's plan for the donkey).

A mechanism for simulating a cooperative program with a simple generator. The core part of this mechanism is very simple. The scheduler () function wraps a set of generator objects that control the process of delegating the control flow to the appropriate branch. These are not really collaborative programs because they control only the scheduler () function and the branch from that function. But for practical purposes, you can do the same thing with very little extra code. Scheduler () is a code similar to the following:
Listing 1. Scheduler () of Simulation collaborative program

DEF scheduler (GENDCT, start):
 global cargo
 coroutine = start while
 1:
  (coroutine, cargo) = gendct[ Coroutine].next ()

One thing to note about this wrapper is that each builder/collaboration program generates a tuple that contains its intended branch target. The generator/collaboration program basically exits at GOTO target. For convenience, I also let the generator generate a standard cargo container as a way to formalize the data that is transferred between collaborative programs-but you can also transfer data using only global variables or callback Setter/getter functions that have been agreed. Raymond Hettinger has written a Python enhancement initiative (Python enhancement Proposal,pep), which is designed to better encapsulate the data that is being delivered, and perhaps Python will include this initiative in the future.

The New Scheduler

For light threads, their requirements are slightly different from those of a collaborative program. But we can still use the scheduler () function at its core. The difference is that the scheduler itself should determine the branch target rather than receive the branch target from the builder/collaboration program. Now let me show you a complete test program and sample:
Listing 2. microthreads.py Sample Script

 from __future__ import generators import sys, TIME threads =  
  
  [] totalswitches = 10**6 numthreads = 10**5def null_factory (): Def empty (): While1: Yield None return empty () def quitter (): For N in Xrange (TOT 
  
  
  Alswitches/numthreads): Yield None def Scheduler (): Global Threads Try:
  
  
  While1:for thread in Threads:thread.next () except stopiteration: Passif __name__ = = "__main__": For I in Range (numthreads): Threads.append (n Ull_factory ()) Threads.append (Quitter ()) StartTime = Time.clock () scheduler () print "Total time:", time.cl  
Ock ()-starttime print "Total switches:", totalswitches print "Total THREADS:", Numthreads

This is probably the simplest lightweight threading scheduler you can choose from. Each thread enters in a fixed order, and each thread has the same priority. Next, let's take a look at how to deal with detail issues. As with the collaborative program described in the previous section, you should follow some conventions when writing lightweight threads.

Handling details

In most cases, a lightweight thread builder should be included in the while 1: loop. This method of setting up the scheduler will cause the entire scheduler to stop when one of the threads stops. In a sense, "robustness" is not as good as OS threads--but capturing exceptions within the loop of scheduler () does not require more machine resources than it does on the outside of the loop. Also, we can remove a thread from the threads list without terminating it (by itself or by other threads). We didn't actually provide a detailed way to make the deletion easier, but the more common extension method might be to store threads in a dictionary or some other structure rather than in a list.

This example demonstrates a reasonable way to terminate the scheduler loop at the end of the process. Quitter () is a special generator/thread that monitors a condition (in this example, the count of a context switch) and throws stopiteration when the condition is satisfied (this example does not capture other exceptions). Note that after termination, all other generators are complete and, if necessary, can be recovered in the future (in the micro-threading scheduler or in other programs). Obviously, you can delete these generators/threads if you want.

The example discussed here uses a special, meaningless thread. They do nothing and do it in a form that is least likely. We built this example to illustrate the point that the intrinsic overhead of a lightweight thread is very low. It is easy to create 100,000 lightweight threads on an older Windows Pentium II laptop with only MB of memory (if 1 million threads are reached, a long disk "Smash" will occur). Try using OS threads! Also, on this slower 366 MHz chip, 1 million context switches can be performed within approximately 10 seconds (the number of threads involved has no significant impact on time consuming). Obviously, a real lightweight thread should do something, and this will use more resources depending on the task. But the thread itself has earned a "lightweight" reputation.

Switching overhead

The

Switching between lightweight threads is very small, but it is not entirely cost-free. To test this, I built an example that performs some sort of work, but is about the smallest amount of truth you can do in a thread. Because the thread scheduler is really equivalent to the instruction "execute a, then execute B, then execute C, and so on", it is not difficult to create a completely parallel situation in the main function.
Listing 3. overhead.py Sample Script

From __future__ import generators import times = 100000 def stringops ():  For n in Xrange (times): s = "Mary had a little lamb" s = s.upper () s = "Mary
  
   Had a little lamb "s = s.lower () s =" Mary had a little lamb "s = s.replace (' A ', ' a ') def Scheduler (): For n in xrange, for thread in THREADS:THREAD.N
  
   EXT () def UPPER (): While1:s = "Mary had a little lamb" s = s.upper () yield
  
   None def Lower (): While1:s = "Mary had a little lamb" s = s.lower () yield
  
  None def replace (): While1:s = "Mary had a little lamb" s = s.replace (' a ') , ' A ') yield None if __name__== ' __main__ ': start = Time.clock () Stringo PS () Looptime = time.clOck ()-start print "LOOP time:", Looptime Global Threads Threads.append (upper ()) Threads.append ( Lower ()) Threads.append (replace ()) start = Time.clock () scheduler () ThreadTime = Time.clock ()-start print "Threa

 D time: ", ThreadTime

The results show that the version of the lightweight thread runs a little more than twice in the time the direct loop version runs--and that the lightweight thread runs for less than 3 seconds on the machine mentioned above, and the direct loop runs for more than 6 seconds. Obviously, if each unit of work is twice times, 10 times times, or 100 times times the equivalent of a single string method call, the cost of the thread is correspondingly smaller.

Design Threads

Lightweight threads can (and usually should) be larger than individual conceptual operations. Either thread is used to represent the amount of flow context that is required to describe a particular task or activity. However, the task may take more/less time/size than we would like to use in the context of a separate thread. Preemption automatically handles this problem without requiring application developers to make any specific intervention. Unfortunately, lightweight threading users need to be aware of "handling" other threads well.

At the very least, lightweight threads should be well-designed enough to be yield when completing conceptual operations. The scheduler will be back here for the next step. For example:
Listing 4. Pseudo-code friendly lightweight threads

Def nicethread ():
While 1:
... operation A ...
Yield None
... operation B ...
Yield None
... operation C ...
Yield None

In most cases, a good design will yield more times than the boundary between basic operations. Even so, what is often conceptually "basic" involves loops over a large set. If this is the case (depending on how time-consuming the loop body is), it may be helpful to include one or two yield in the loop body (which may occur again after a specific number of loop iterations are executed). Unlike a priority thread, a lightweight thread that behaves poorly gets an unlimited amount of exclusive processor time.

Other parts of the schedule

To date, the above example shows only a few of the most basic forms of thread scheduler. There are a number of possible implementations (this problem has nothing to do with designing a good generator/thread). Let me show you some of the enhancements that might occur in several transmissions.
Better thread Management

A simple threads list makes it easy to add generators/threads to be processed by the scheduler. However, this data structure does not make it easy to delete or suspend threads that are no longer relevant. A dictionary or class might be a better data structure in thread management. The following is a quick example that can (almost) conveniently access the threads list in the sample:
Listing 5. Thread-Managed Python class examples

Class ThreadPool: "" "Enhanced threads list as class threads = ThreadPool () threads.append (thre 
    
     ADFUNC) # Not generator object if Threads.query (num) <

You can do more, and that's a good starting point.
Thread Priority

In a simple example, all threads get the same attention from the scheduler. There are at least two common ways to achieve a more tuned thread-priority system. A priority system can devote more attention to "high priority" threads than to lower-priority threads. We can implement it in a straightforward way by creating a new class Prioritythreadpool (ThreadPool), a class that returns more important threads more frequently during the thread iterations. The simplest method may return some threads in the. __getitem__ () method multiple times in a row. Then, a high priority thread might receive two, or more, or 100 consecutive "time slices," not just the original one. A (very weak) "Live" variable here can most likely return multiple copies of important threads scattered throughout the thread list. This increases the actual frequency of service to high-priority threads, not just all of the attention they receive.

It may not be easy to use a more complex thread-priority approach in pure Python (though it is implemented using some sort of third-party-specific os/processor library). The scheduler is not an integer that gives only a high priority thread a time slice, it can also measure the actual time spent in each lightweight thread, and then dynamically adjusts the thread schedule so that it is more "fair" to the thread waiting to be processed (perhaps fairness and thread priority are related). Unfortunately, Python's Time.clock () and its series are not sufficiently accurate timers to make this way effective. On the other hand, there is nothing to prevent the threads in the "multi-time" approach from handling the thread to dynamically increase its own priority.
combining micro-threading and collaboration Programs

To create a lightweight thread (micro-threading) Scheduler, I removed the collaboration program logic "please branch to here". This is not really necessary. Lightweight threads in the example typically generate none, not jump targets. We can all combine these two concepts: if the collaborative program/Line mechanical engineer into a jump target, the scheduler can jump to the requested location (perhaps, unless overridden by thread priority). However, if the collaborative program/thread only generates None, the scheduler can decide for itself which appropriate thread to focus on next. It involves a lot of work to decide (and write) how a random jump will interact with a linear thread queue, but there is nothing particularly mysterious about these jobs.

Fast and cheap-why don't you like it?

The micro-threading mode (or "lightweight thread") can be essentially summed up as another strange style of Python's mid-stream control. Several other styles have been discussed in the previous sections of this column. The attraction of a variety of control mechanisms is that it allows developers to isolate code functionality within their logical components and maximize the contextual relevance of the code.

In fact, the possibility of doing anything you can do is not complicated (just use a "loop" and an "if"). For a problem that is easily decomposed into many small "proxies", "Servers" or "processes", lightweight threading may be the clearest model for expressing the underlying "business logic" of an application. Of course, lightweight threading can be very fast compared to some of the more well-known flow mechanisms, which is not a big deal.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More