Tutorial on implementing micro-threading programming with Python Builder

Last Update:2016-06-06 Source: Internet

Author: User

Tags switches

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The micro-threading realm, at least in Python, has been a special enhancement part of Stackless python. The topic of stackless and the recent changes it has undergone may well be worth opening a column. But the simple truth is that, under the "New Stackless," continuation (continuation) is obviously outdated, but micro-threading is the reason for the existence of this project. This is complicated ...

At first, let's review some of the content. So what is micro-threading? Micro-Threading can basically be a process that requires very little internal resources to run-and a process that runs in a single instance of the Python interpreter (in public memory space, and so on). With micro-threading, we can run tens of thousands of concurrent processes on the currently medium-performance PC, and switch between contexts hundreds of thousands of times per second. A call to fork () or a standard OS thread call simply does not reach this level! Even threads in the so-called "lightweight" thread Library "weigh" several orders of magnitude more than the micro threads presented here.

The meaning of the lightweight thread I've covered in this column is a little different from the meaning of the OS thread. In this regard, they are also different from what Stackless has to offer. In many ways, lightweight threading is much simpler than most variants, and most of the problems with signals, locks, and the like are absent. The cost of simplicity is that I propose a form of "collaborative multithreading," and I think it's not feasible to add a preemption to the standard Python framework (at least in non-stackless python 2.2-no one knows what __future__ will bring).

Lightweight threading, in a sense, recalls the collaborative multitasking of earlier versions of Windows and MacOS (but in a single application). In another sense, however, a lightweight thread is just another way of expressing the flow in a program, and everything that a lightweight thread does (at least in principle) can be done with "really big if/elif blocks" (Reckless Programmer's Qian Donkey).

A mechanism for simulating a cooperative program with a simple generator. The core part of this mechanism is very simple. The scheduler () function wraps a set of generator objects that control the process of delegating control flow to the appropriate branch. These are not really co-ordinated programs, because they only control the scheduler () function and the branch from that function. But for practical purposes, you can do the same thing with very little extra code. Scheduler () is a code similar to the following:
Listing 1. Scheduler () of the Simulation collaboration program

DEF scheduler (GENDCT, start): global cargo coroutine = start while 1:  (coroutine, cargo) = Gendct[coroutine].next ()

One thing to note about this wrapper is that each generator/co-application generates a tuple that contains its intended branch target. The generator/co-program basically exits at the GOTO target. For the sake of convenience, I also have the generator generate a standard cargo container as a way to formalize the data that is transferred between the cooperative programs-but you can also transfer data using only global variables or callback Setter/getter functions that have been agreed upon. Raymond Hettinger has written a Python enhancement initiative (Python enhancement Proposal,pep) designed to make the transmitted data better encapsulated, and perhaps Python will include this initiative in the future.

The New Scheduler

For lightweight threads, their requirements are slightly different from the need for a synergistic program. However, we can still use the scheduler () function at its core. The difference is that the scheduler itself should determine the branch target, rather than receiving the branch target from the generator/co-operation. Let me show you a complete test program and sample:
Listing 2. microthreads.py Sample Script

From     __future__     import     generators    import     sys, timethreads = []totalswitches =     10** 6NUMTHREADS =     10**5def     null_factory ():     def     Empty ():      while1:     yield     None     return     empty ()    def     quitter ():     for     n     in     xrange (totalswitches/numthreads):      yield     None    def     Scheduler ():     global     Threads     try    :      while1:       for     thread     in     threads:thread.next ()     except     stopiteration:      passif     __name__ = =     "__main__ "    :     for     i     in     range (numthreads):  threads.append (Null_factory ()) Threads.append ( Quitter ()) StartTime = Time.clock () scheduler ()     print "Total time:"    , Time.clock ()-starttime     print " Total switches: "    , totalswitches     print" Total THREADS: "    , Numthreads

This is probably the simplest lightweight thread scheduler you can choose. Each thread enters in a fixed order, and each thread has the same priority. Next, let's take a look at how to deal with the detail problem. As in the previous section, you should follow some conventions when writing lightweight threads.

Handling details

In most cases, the generator of a lightweight thread should be included in the while 1: loop. The method of setting the scheduler here will cause the entire scheduler to stop when one of the threads stops. In a sense, "robustness" is not as good as OS threading--but catching exceptions in the loop of scheduler () does not require more machine resources than in the loop. Furthermore, we can remove a thread from the threads list without terminating it (terminated by itself or another thread). We don't actually provide a detailed way to make the deletion easier, but the more common extension method might be to store the thread in a dictionary or some other structure instead of the list.

This example illustrates a reasonable way to finally terminate the scheduler loop. Quitter () is a special generator/thread that monitors a condition (in this case just a count of context switches) and throws stopiteration when the condition is met (no other exceptions are caught in this example). Please note that after termination, all other generators are still intact and can be resumed in the future (in the micro-thread scheduler or other programs) if needed. Obviously, you can delete these generators/threads if you want.

The example discussed here uses a special meaningless thread. They do nothing and achieve this in a least-likely form. We set up this example to illustrate the point that the intrinsic overhead of a lightweight thread is very low. It is easy to create 100,000 lightweight threads on an older Windows 98 Pentium II laptop with only ten MB of memory (if you reach 1 million threads, you will have a long disk "Slam"). Try using OS thread! Moreover, the slower 366 MHz chip can perform 1 million context switches in approximately 10 seconds (the number of threads involved has no significant impact on time-consuming). Obviously, a real lightweight thread should do something, and this will use more resources depending on the task. But the thread itself won the "light" reputation.

Switching overhead

Switching overhead between lightweight threads is small, but not entirely overhead. To test this, I've built an example that performs some kind of work (though it's about the smallest amount of logic you can do in a thread). Because the thread scheduler is really the same as "execute a, then execute B, then execute C, and so on", it is not difficult to create a fully parallel case in the main function.
Listing 3. overhead.py Sample Script

From __future__ import generators Import timetimes = 100000 def stringops (): For n i n Xrange (times): s = "Mary had a little lamb" s = s.upper () s = "Mary had a little lamb" s = s.low ER () s = "Mary had a little lamb" s = s.replace (' A ', ' a ') def scheduler (): For N in xrange  (times): For thread in Threads:thread.next () def upper (): While1:s = "Mary had a little Lamb "s = s.upper () yield None def lower (): While1:s =" Mary had a little lamb "s = s . Lower () yield None def replace (): While1:s = "Mary had a little lamb" s = s.replace (' A  ', ' A ') yield None if __name__== ' __main__ ': start = Time.clock () stringops () Looptime = Time.clock ()-start print "LOOP time:", Looptime Global Threads Threads.append (upper ()) Threads.append (Lowe R ()) Threads.append (replAce ()) start = Time.clock () scheduler () ThreadTime = Time.clock ()-start print "THREAD time:", ThreadTime

The result shows that, in the time of the direct loop version running, the lightweight thread version runs twice as much--it's equivalent to the machine mentioned above, the light thread runs for less than 3 seconds, and the direct loop runs for more than 6 seconds. Obviously, if each unit of work is twice times, 10 times times, or 100 times times more than a single string method call, then the proportion of thread overhead spent is correspondingly smaller.

Design Threads

Lightweight threading can (and usually should) be larger than a single conceptual operation. Regardless of the thread, it is used to represent the amount of flow context required to describe a particular task or activity. However, the time/size of the task may be more/larger than we would like to use in a separate thread context. Preemption automatically handles this problem and does not require any specific interference from the application developer. Unfortunately, lightweight threading users need to be careful to "handle" other threads well.

At the very least, lightweight threads should be well-designed enough to yield when conceptual operations are completed. The scheduler will return here for the next step. For example:
Listing 4. Pseudo code-friendly lightweight threading

Def nicethread ():
While 1:
..... Operation A ...
Yield None
..... Operation B ...
Yield None
..... Operation C ...
Yield None

In most cases, a good design will yield more times than the boundary between the basic operations. Nevertheless, what is usually conceptually "basic" involves a cycle of a large set. If this is the case (depending on how much time is spent on the loop body), it may be helpful to add one to two yield (which may recur after a specific number of loop iterations) in the loop body. Unlike a priority thread, a lightweight thread that behaves poorly acquires an unlimited amount of exclusive processor time.

Other parts of the schedule

So far, the above example shows only a few thread schedulers in the form of the most basic. There are many possible implementations (this problem is not related to designing a good generator/thread). Let me show you some of the enhancements that may appear in several transmissions.
Better thread Management

A simple threads list makes it easy to add generators/threads to be processed by the scheduler. But this data structure does not make it easy to delete or suspend threads that are no longer relevant. A dictionary or class might be a better data structure in thread management. The following is a quick example of a class that can (almost) access the threads list in the sample:
Listing 5. Example of a Python class for thread management

Class ThreadPool: "" "Enhanced threads list as class threads = ThreadPool () threads.append (ThreadFunc) # Not Generator object if Threads.query (num) <
 
  
: Threads.remove (num) "" "Def __init__ (self): Self.threadlist = [] self.threaddict = {} Self.avai L = 1def __getitem__ (self, N): return self.threadlist[n] def append (se LF, ThreadFunc, Docstring=none): # Argument is the generator func, not the Gen object# every ThreadFunc should Contain a docstring docstring = DocString or threadfunc.__doc__ self.threaddict[self.avail]          = (docstring, ThreadFunc ()) Self.avail + = 1 Self.threadlist = [p[1] for p in Self.threaddict.values ()] return self.avail-1# return the Threadiddef remove (s         Elf, ThreadID): Del self.threaddict[threadid] self.threadlist = [p[1] for P In Self.threaddict.values ()] def query (self, ThreadID): "Information O    n Thread, if     It exists (otherwise None) return Self.threaddict.get (Threadid,[none]) [0]

You can achieve more, and that's a good starting point.
Thread Priority

In a simple example, all threads get the same attention as the scheduler. There are at least two common ways to achieve a better-tuned thread-priority system. A priority system can devote more attention to a "high-priority" thread than a low-priority thread. We can implement it in a straightforward way by creating a new class Prioritythreadpool (ThreadPool), which returns more important threads more frequently during thread iterations. The simplest method may return some threads in the. __getitem__ () method several times in a row. Then, a high-priority thread might receive two, or more, or 100 consecutive "time slices", not just the original one. A (very weak) "real-time" variable here can potentially return multiple copies of important threads scattered throughout the thread list. This increases the actual frequency of service to high-priority threads, not just all of the attention they receive.

It may not be easy to use a more complex thread-priority method in pure Python (though it is implemented using some third-party os/-specific libraries). The scheduler does not give only the high-priority threads a single time slice of the integer number, it also measures the actual time spent in each lightweight thread, and then dynamically adjusts the thread schedule so that it is more "fair" to the thread waiting to be processed (perhaps fairness and thread precedence are related). Unfortunately, Python's Time.clock () and its series are not sufficiently high-precision timers to make this way effective. On the other hand, nothing can prevent the "multi-Time slice" method from dealing with insufficient threads to dynamically increase its own priority.
Combining micro-threading and collaboration programs

In order to create a lightweight thread (micro-threading) Scheduler, I removed the collaborative program logic "Please branch-here". This is not really necessary. The lightweight threads in the example typically generate none, rather than a jump target. We can combine these two concepts together: if the co-program/line generates becomes the jump target, the scheduler can jump to the requested place (perhaps, unless it is overridden by a thread priority). However, if the co-program/thread only generates None, the scheduler can decide for itself which appropriate thread to focus on next. Deciding (and writing) how an arbitrary jump actually interacts with a linear thread queue will involve a lot of work, but there is nothing particularly mysterious about these tasks.

Fast and cheap-why don't you like it?

The micro-threading mode (or "lightweight threading") can basically be attributed to another strange style of Python streaming control. Several other styles have been mentioned in the previous sections of this column. The attraction of various control mechanisms is that it allows developers to isolate code functionality within their logical components and to maximize the contextual relevance of the code.

In fact, the possibility of doing anything you can do is not complicated (just use a "loop" and an "if"). Lightweight threading may be the clearest model for expressing the underlying "business logic" of an application, for a class of problems that can easily be broken down into small "proxies," "Servers," or "processes." Of course, lightweight threading can be very fast compared to some of the more well-known streaming mechanisms, which is no big deal at all.



This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More