PHP uses QPM to implement multi-process parallel task handlers

Source: Internet
Author: User

Consider using PHP to implement the following scenario: There is a list of URLs that are captured in the queue, the daemon reads the queue, and then forwards it to the child process to crawl the HTML into the file. In order to improve efficiency, multi-tasking is allowed, but in order to avoid excessive machine load, which limits the maximum number of parallel tasks (we set this number to 3 for testing convenience), the program ends when the end tag is taken in the queue.

This scene is implemented with QPM's Supervisor::taskfactorymode (), which is very simple.

QPM full name is Quick Process Management Module for PHP. PHP is a powerful web development language, so many people often forget that PHP can be used to develop robust command-line (CLI) programs to daemon programs. and the preparation of daemon program is unavoidable to deal with various process management. QPM is a class library that is formally developed for simplified process management. QPM's project address is: HTTPS://GITHUB.COM/COMOS/QPM

To simplify the test environment, we can use a text file to simulate the queue's data. The complete example file looks here: Spider_task_factory_data.txt

http: //news.sina  .com  .cn /http: //news.ifeng  .com /http: //news.163  .com /http: //news.sohu  .com /http: //ent.sina  .com  .cn /http: //ent .com /... END  

Before using QPM's taskfactorymode, we need to prepare a taskfactory class. We named it the Spidertaskfactory,spdiertaskfactory factory method Fetchtask An instance of the subclass that normally returns runnable. When the end is encountered or the file ends, throw stopsignal, and the program terminates.

Here is the code snippet that assembles the Supervisor and executes it. For a complete example, see: spider_task_factory.php

//If no input is specified from the parameter, Spider_task_factory_data.txt is used as the data source$input=isset($argv[1]) ?$argv[1] :__dir__.'/spider_task_factory_data.txt ';$spiderTaskFactory=NewSpidertaskfactory ($input);$config= [//Specify TaskFactory objects and factory methods    ' FactoryMethod '=>[$spiderTaskFactory,' Fetchtask '],//Specifies a maximum concurrent quantity of 3    ' Quantity '=3,];//Start SupervisorQpm\supervisor\supervisor::taskfactorymode ($config)->start ();

The implementation of Spidertaskfactory is as follows:

/** * Mission Factory, the Fetchtask method must be implemented. * This method returns normally * */ class spidertaskfactory {Private $_FH; Public  function __construct($input) {    $this->_input =$input;$this->_FH = fopen ($input,' R ');if($this->_FH = = =false) {Throw New Exception(' fopen failed: '.$input); }} Public  function fetchtask() {     while(true) {if(Feof ($this->_FH)) {Throw NewQpm\supervisor\stopsignal (); }$line= Trim (Fgets ($this->_FH));if($line==' END ') {Throw NewQpm\supervisor\stopsignal (); }if(Empty($line)) {Continue; } Break; }return NewSpidertask ($line);}}

The implementation of Spidertask is as follows:

/** * Classes that perform tasks in a child process * must implement the Qpm\process\runnable interface */ class spidertask implements QPM\process\Runnable { Private $_target; Public  function __construct($target) {    $this->_target =$target;}//Parts that are executed in a child process Public  function run() {    $r= @file_get_contents ($this->_target);if($r===false) {Throw New Exception(' fail to crawl URL: '.$this->_target); } file_put_contents ($this->getlocalfilename (),$r); }Private  function getlocalfilename() {    $filename= Str_replace ('/',' ~ ',$this->_target);$filename= Str_replace (': ',' _ ',$filename);$filename=$filename.'-'. Date (' Ymdhis ');return __dir__.'/_spider/'.$filename.'. html ';}}

Real production environment, the process of running a durable producer/consumer model can be achieved by replacing the file input with a queue.

PHP uses QPM to implement multi-process parallel task handlers

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.