1/2 page of cooperative multi-task implementation using a collaborative program in PHP

Source: Internet
Author: User
Tags rewind
This article guides you through the use of collaborative programs to implement task scheduling and implement technical understanding through instances. I will give a brief background introduction in the first three sections. If you already have a good foundation, you can directly jump to the "collaborative multitasking" section PHP5.5. a better new function is to support generators and collaborative programs. For generators, PHP documents and various other blog articles (like this one or this one), we have provided a very detailed explanation. Collaborative programs are less concerned, so although they have powerful functions, they are hard to be known and difficult to explain.

This article guides you through the use of collaborative programs to implement task scheduling and implement technical understanding through instances. I will give a brief background introduction in the first three sections. If you already have a good foundation, you can directly jump to the "collaborative multitasking" section.

Generator

The most basic idea of a generator is also a function. The return value of this function is output in sequence, rather than returning only a single value. Or, in other words, the generator makes it easier for you to implement the iterator interface. The following describes how to implement an xrange function:


The code is as follows:


Function xrange ($ start, $ end, $ step = 1 ){
For ($ I = $ start; $ I <= $ end; $ I + = $ step ){
Yield $ I;
}
}

Foreach (xrange (1, 1000000) as $ num ){
Echo $ num, "\ n ";
}

The above xrange () function provides the same function as the built-in function range () in PHP. However, the difference is that the range () function returns an array containing group values from 1 to 1 million (note: Please refer to the manual ). The xrange () function returns an iterator that outputs these values in sequence and does not actually calculate the values in an array.

The advantages of this method are obvious. It allows you to process big data sets without loading them to the memory at a time. You can even process infinite data streams.

Of course, you can also implement this function by using a generator, but by inheriting the Iterator interface. It is more convenient to implement using the generator, instead of implementing the five methods in the iterator interface.

The generator is an interrupted function.
It is very important to understand the collaboration program from the generator and how they work internally: the generator is an interrupt function, in which yield forms a breakpoint.

In the preceding example, if you call xrange (), the code in the xrange () function does not actually run. On the contrary, PHP only returns a generator class instance that implements the iterator interface:

The code is as follows:


$ Range = xrange (1, 1000000 );
Var_dump ($ range); // object (Generator) #1
Var_dump ($ range instanceof Iterator); // bool (true)

You call the iterator method once for an object, and the code runs once. For example, if you call $ range-> rewind (), the code in xrange () runs to the place where the control flow first appears yield. In this case, this means that yield $ I runs only when $ I = $ start. The value passed to the yield statement is obtained using $ range-> current.

To continue executing the code in the generator, you must call the $ range-> next () method. This will start the generator again until the yield statement appears. Therefore, you can obtain all values from the generator by calling the next () and current () methods consecutively until the yield statement does not appear at a certain point. For xrange (), this occurs when $ I exceeds $ end. In this case, the control flow reaches the end of the function, so no code is executed. Once this happens, the vaild () method returns false, and the iteration ends.

Coroutine

The main thing coroutine adds to the above function is the ability to send data back to the generator. This will convert the one-way communication from the generator to the caller into two-way communication.
Pass data to the coroutine by calling the send () method of the generator instead of its next () method. The following example shows how the logger () coroutine runs the communication:


The code is as follows:


Function logger ($ fileName ){
$ FileHandle = fopen ($ fileName, 'A ');
While (true ){
Fwrite ($ fileHandle, yield. "\ n ");
}
}

$ Logger = logger (_ DIR _. '/log ');
$ Logger-> send ('Foo ');
$ Logger-> send ('bar ')

As you can see, yield is used as an expression instead of a statement. That is, it has a return value. The returned value of yield is the value passed to the send () method. In this example, yield first Returns "Foo" and then "Bar ".

In the above example, yield serves only as the receiver. It is possible to mix the two methods, that is, either receive or send. The following is an example of how to send and receive messages:


The code is as follows:


Function gen (){
$ Ret = (yield 'yield1 ');
Var_dump ($ ret );
$ Ret = (yield 'yield2 ');
Var_dump ($ ret );
}

$ Gen = gen ();
Var_dump ($ gen-> current (); // string (6) "yield1"
Var_dump ($ gen-> send ('ret1'); // string (4) "ret1" (the first var_dump in gen)
// String (6) "yield2" (the var_dump of the-> send () return value)
Var_dump ($ gen-> send ('ret2'); // string (4) "ret2" (again from within gen)
// NULL (the return value of-> send ())

It is difficult to understand the exact sequence of output immediately, so make sure you know why output is in this way. I would like to note two points: the yield expression uses parentheses on both sides at the first point. For technical reasons (although I have already considered adding an exception to the assignment, just like Python), parentheses are required. Second, you may have noticed that rewind () was not called before calling current (). If so, the rewind operation has been implicitly executed.

Multi-task collaboration

If you read the above logger () example, why do you think "I need to use coroutine for two-way communication? Why can't I use only common classes ?", This is correct. The above example demonstrates the basic usage, but the context does not really show the advantages of using coroutine. This is the reason for listing many coroutine examples. As mentioned above, coroutine is a very powerful concept, but such applications are rare and often complex. It is difficult to give some simple and real examples.

In this article, I decided to use coroutine to implement multi-task collaboration. The problem we try to solve is that you want to run multiple tasks (or programs) concurrently "). However, the processor can only run one task at a time (the goal of this article is not to consider multi-core ). Therefore, the processor needs to switch between different tasks and always let each task run for "a little while ".

The term "collaboration" in the term "multi-task collaboration" describes how to perform this switchover: it requires that the currently running task automatically send the control back to the scheduler so that it can run other tasks. This is the opposite of "preemption" multi-task: the scheduler can interrupt a task that has been running for a period of time, whether it like it or not. Collaborative multitasking was used in earlier versions of Windows (Windows 95) and Mac OS, but they all switched to use preemptive multitasking. The reason is quite clear: if you rely on the program to automatically return control, the software with bad behavior will easily occupy the entire CPU for itself and will not share it with other tasks.

In this case, you should understand the relationship between the coroutine and task scheduling: the yield command provides a method to interrupt the task itself, and then passes the control to the scheduler. Therefore, coroutine can run multiple other tasks. Furthermore, yield can be used to communicate between tasks and schedulers.

Our purpose is to use more lightweight packaged coroutine functions for "tasks:

The code is as follows:


Class Task {
Protected $ taskId;
Protected $ coroutine;
Protected $ sendValue = null;
Protected $ beforeFirstYield = true;

Public function _ construct ($ taskId, Generator $ coroutine ){
$ This-> taskId = $ taskId;
$ This-> coroutine = $ coroutine;
}

Public function getTaskId (){
Return $ this-> taskId;
}

Public function setSendValue ($ sendValue ){
$ This-> sendValue = $ sendValue;
}

Public function run (){
If ($ this-> beforeFirstYield ){
$ This-> beforeFirstYield = false;
Return $ this-> coroutine-> current ();
} Else {
$ Retval = $ this-> coroutine-> send ($ this-> sendValue );
$ This-> sendValue = null;
Return $ retval;
}
}

Public function isFinished (){
Return! $ This-> coroutine-> valid ();
}
}

A task uses the task ID to mark a coroutine. Using the setSendValue () method, you can specify the values that will be sent to the next recovery (you will know later that we need this ). The run () function does not do anything except the coordinator that calls the send () method. To understand why beforeFirstYieldflag is added, consider the following code snippet:


The code is as follows:


Function gen (){
Yield 'foo ';
Yield 'bar ';
}

$ Gen = gen ();
Var_dump ($ gen-> send ('something '));

// As the send () happens before the first yield there is an implicit rewind () call,
// So what really happens is this:
$ Gen-> rewind ();
Var_dump ($ gen-> send ('something '));

// The rewind () will advance to the first yield (and ignore its value), the send () will
// Advance to the second yield (and dump its value). Thus we loose the first yielded value!

By adding beforeFirstYieldcondition, we can determine that the value of first yield is returned.

The scheduler now has to do a little more than a multi-task loop before running the multi-task:

The code is as follows:


Class Scheduler {
Protected $ maxTaskId = 0;
Protected $ taskMap = []; // taskId => task
Protected $ taskQueue;

Public function _ construct (){
$ This-> taskQueue = new SplQueue ();
}

Public function newTask (Generator $ coroutine ){
$ Tid = ++ $ this-> maxTaskId;
$ Task = new Task ($ tid, $ coroutine );
$ This-> taskMap [$ tid] = $ task;
$ This-> schedule ($ task );
Return $ tid;
}

Public function schedule (Task $ task ){
$ This-> taskQueue-> enqueue ($ task );
}

Public function run (){
While (! $ This-> taskQueue-> isEmpty ()){
$ Task = $ this-> taskQueue-> dequeue ();
$ Task-> run ();

If ($ task-> isFinished ()){
Unset ($ this-> taskMap [$ task-> getTaskId ()]);
} Else {
$ This-> schedule ($ task );
}
}
}
}

NewTask () method (use the next idle task id) to create a new task, and then put the task into the task ing array. Then, the task is scheduled by putting the task into the task queue. Then run () method to scan the task queue and run the task. If a task ends, it will be deleted from the queue; otherwise, it will be scheduled again at the end of the queue.
Let's take a look at the following schedulers with two simple (and meaningless) tasks:


The code is as follows:


Function task1 (){
For ($ I = 1; $ I <= 10; ++ $ I ){
Echo "This is task 1 iteration $ I. \ n ";
Yield;
}
}

Function task2 (){
For ($ I = 1; $ I <= 5; ++ $ I ){
Echo "This is task 2 iteration $ I. \ n ";
Yield;
}
}

$ Scheduler = new Scheduler;

$ Schedtask-> newTask (task1 ());
$ Schedtask-> newTask (task2 ());

$ Scheduler-> run ();

Both tasks only display one message, and then use yield to return the control to the scheduler. The output result is as follows:

The code is as follows:


This is task 1 iteration 1.
This is task 2 iteration 1.
This is task 1 iteration 2.
This is task 2 iteration 2.
This is task 1 iteration 3.
This is task 2 iteration 3.
This is task 1 iteration 4.
This is task 2 iteration 4.
This is task 1 iteration 5.
This is task 2 iteration 5.
This is task 1 iteration 6.
This is task 1 iteration 7.
This is task 1 iteration 8.
This is task 1 iteration 9.
This is task 1 iteration 10.

The output is exactly as we expected: for the first five iterations, two tasks run alternately. after the second task is completed, only the first task continues to run.

Communication with the scheduler

Now that the scheduler is running, we will switch to the next item in the calendar: communication between the task and the scheduler. We will use the process to communicate with the operating system session in the same way: system call. The reason we need to call the system is that the operating system is at a different permission level than the process. Therefore, in order to execute privileged operations (such as killing another process), you have to pass the control back to the kernel in some way, so that the kernel can perform the said operations. Again, this behavior is implemented internally by using interrupt commands. In the past, General int commands were used. Today, more special and faster syscall/sysenter commands are used.

Our task scheduling system will reflect this design: instead of simply passing the scheduler to the task (so that it can do anything it wants for a long time ), we will pass information to the yield expression to communicate with the system call. Here, yield is an interrupt, and it is also a method for passing information to the scheduler (and passing information from the scheduler.

To illustrate the system call, I will make a small encapsulation of the callable system call:

The code is as follows:


Class SystemCall {
Protected $ callback;

Public function _ construct (callable $ callback ){
$ This-> callback = $ callback;
}

Public function _ invoke (Task $ task, Scheduler $ schedke ){
$ Callback = $ this-> callback; // Can't call it directly in PHP :/
Return $ callback ($ task, $ scheduler );
}
}

It will run like any other callable (using _ invoke), but it requires the scheduler to pass the task being called and itself to this function. To solve this problem, we have to slightly modify the run method of the scheduler:


The code is as follows:


Public function run (){
While (! $ This-> taskQueue-> isEmpty ()){
$ Task = $ this-> taskQueue-> dequeue ();
$ Retval = $ task-> run ();

If ($ retval instanceof SystemCall ){
$ Retval ($ task, $ this );
Continue;
}

If ($ task-> isFinished ()){
Unset ($ this-> taskMap [$ task-> getTaskId ()]);
} Else {
$ This-> schedule ($ task );
}
}
}

The first system call does nothing except the returned task ID:


The code is as follows:


Function getTaskId (){
Return new SystemCall (function (Task $ task, Scheduler $ scheduler ){
$ Task-> setSendValue ($ task-> getTaskId ());
$ Schedle-> schedule ($ task );
});
}

This function indeed sets the task id to the value sent next time and schedules the task again. Because the system call is used, the scheduler cannot automatically call the task. we need to manually schedule the task (you will understand why this is done later ). To use this new system call, we need to re-compile the previous example:


The code is as follows:


Function task ($ max ){
$ Tid = (yield getTaskId (); // <-- here's the syscall!
For ($ I = 1; $ I <= $ max; ++ $ I ){
Echo "This is task $ tid iteration $ I. \ n ";
Yield;
}
}

$ Scheduler = new Scheduler;

$ Schedtask-> newTask (task (10 ));
$ Schedtask-> newTask (task (5 ));

$ Scheduler-> run ();

This code will give the same output as the previous example. Note that the system call runs as normally as any other call, but yield is added in advance. To create new tasks and then kill them, more than two system calls are required:

The code is as follows:


Function newTask (Generator $ coroutine ){
Return new SystemCall (
Function (Task $ task, Scheduler $ schedut) use ($ coroutine ){
$ Task-> setSendValue ($ scheduler-> newTask ($ coroutine ));
$ Schedle-> schedule ($ task );
}
);
}

Function killTask ($ tid ){
Return new SystemCall (
Function (Task $ task, Scheduler $ sched)) use ($ tid ){
$ Task-> setSendValue ($ scheduler-> killTask ($ tid ));
$ Schedle-> schedule ($ task );
}
);
}

The killTask function needs to add a method to the scheduler:


The code is as follows:


Public function killTask ($ tid ){
If (! Isset ($ this-> taskMap [$ tid]) {
Return false;
}

Unset ($ this-> taskMap [$ tid]);

// This is a bit Uugly and cocould be optimized so it does not have to walk the queue,
// But assuming that killing tasks is rather rare I won't bother with it now
Foreach ($ this-> taskQueue as $ I => $ task ){
If ($ task-> getTaskId () ===$ tid ){
Unset ($ this-> taskQueue [$ I]);
Break;
}
}

Return true;
}

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.