PHP is a powerful web development language, so that you often forget that PHP can be used to develop robust command line (CLI) programs so that daemon programs, while writing daemon programs is inevitable.
PHP is a powerful web development language, so that you often forget that PHP can be used to develop robust command line (CLI) programs so that daemon programs, while daemon programs are inevitably used to deal with various process management, using QPM to write multi-process programs is very simple. This article is an example of using QPM's Supervisor: taskFactoryMode () to implement a multi-process parallel task processing program.
PHP daemon QPM process management
Consider using PHP to implement the following scenario: there is a URL list of the capture site stored in the queue, the background program reads this queue, and then transfers it to the sub-process to capture HTML and store it in the file. Multi-Task Parallel execution is allowed to improve efficiency, but the maximum number of parallel tasks is limited to avoid excessive server load (we set this number to 3 for test convenience ), when the END mark is obtained in the queue, the program stops running.
This scenario is implemented using QPM's Supervisor: taskFactoryMode (), which is very simple.
The full name of QPM is Quick Process Management Module for PHP. PHP, which is a powerful web development language. as a result, we often forget that PHP can be used to develop robust command line (CLI) programs so that daemon programs. Writing a daemon program inevitably deals with various process management processes. QPM is a class library developed to simplify process management. QPM's project address is: https://github.com/Comos/qpm
To simplify the test environment, we can use a text file to simulate data in the queue. For the complete example file, see spider_task_factory_data.txt.
http://news.sina.com.cn/http://news.ifeng.com/http://news.163.com/http://news.sohu.com/http://ent.sina.com.cn/http://ent.ifeng.com/...END
Before using taskFactoryMode of QPM, we need to prepare a TaskFactory class. We name it SpiderTaskFactory. The Factory method of SpdierTaskFactory, fetchTask, returns the Runnable subclass instance normally. When an END or file ends, throw StopSignal will terminate the program.
The following is a code snippet that assembles and executes the Supervisor. For a complete example, see spider_task_factory.php.
// If no input is specified from the parser, use spider_task_factory_data.txt as the data source $ input = isset ($ argv [1])? $ Argv [1]: _ DIR __. '/spider_task_factory_data.txt'; $ spiderTaskFactory = new SpiderTaskFactory ($ input); $ config = [// specify taskFactory object and Factory method 'factorymethod' => [$ spiderTaskFactory, 'fetchtask'], // specify the maximum number of concurrent jobs to be 3' quantity '=> 3,]; // start Supervisorqpm \ supervisor \ Supervisor: taskFactoryMode ($ config) -> start ();
The implementation of SpiderTaskFactory is as follows:
/*** The fetchTask method must be implemented for the job factory. * This method returns a normal result. **/class SpiderTaskFactory {private $ _ fh; public function _ construct ($ input) {$ this-> _ input = $ input; $ this-> _ fh = fopen ($ input, 'r'); if ($ this-> _ fh = false) {throw new Exception ('fopen failed: '. $ input) ;}} public function fetchTask () {while (true) {if (feof ($ this-> _ fh )) {throw new qpm \ supervisor \ StopSignal () ;}$ line = trim (fgets ($ this-> _ fh); if ($ line = 'end ') {throw new qpm \ supervisor \ StopSignal () ;}if (empty ($ line) {continue ;}break ;}return new SpiderTask ($ line );}}
The implementation of SpiderTask is as follows:
/*** The class for executing tasks in sub-processes * must implement the qpm \ process \ Runnable interface */class SpiderTask implements qpm \ process \ Runnable {private $ _ target; public function _ construct ($ target) {$ this-> _ target = $ target;} // part of the public function run () executed in the sub-process () {$ r = @ file_get_contents ($ this-> _ target); if ($ r = false) {throw new Exception ('fail to crawl url :'. $ this-> _ target);} file_put_contents ($ this-> getLocalFilename (), $ r);} private functi On getLocalFilename () {$ filename = str_replace ('/','~ ', $ This-> _ target); $ filename = str_replace (': ',' _ ', $ filename); $ filename = $ filename. '-'. date ('ymdhis '); return _ DIR __. '/_ spider/'.w.filename.'.html ';}}
In a real production environment, you can replace the input file with a queue to implement a producer/consumer model program that runs permanently.