PHP uses QPM to implement multi-process parallel task processing program, phpqpm

Source: Internet
Author: User

PHP uses QPM to implement multi-process parallel task processing program, phpqpm

Consider using PHP to implement the following scenario: There is a URL list of the capture site stored in the queue, the background program reads this queue, and then transfers it to the sub-process to capture HTML and store it in the file. Multi-task parallel execution is allowed to improve efficiency, but the maximum number of parallel tasks is limited to avoid excessive server load (we set this number to 3 for test convenience ), when the END mark is obtained in the queue, the program stops running.

This scenario is implemented using QPM's Supervisor: taskFactoryMode (), which is very simple.

The full name of QPM is Quick Process Management Module for PHP. PHP, which is a powerful web development language. As a result, we often forget that PHP can be used to develop robust command line (CLI) programs so that daemon programs. Writing a daemon program inevitably deals with various process management processes. QPM is a class library developed to simplify process management. QPM's Project address is: https://github.com/Comos/qpm

To simplify the test environment, we can use a text file to simulate data in the queue. For the complete example file, see spider_task_factory_data.txt.

http://news.sina.com.cn/http://news.ifeng.com/http://news.163.com/http://news.sohu.com/http://ent.sina.com.cn/http://ent.ifeng.com/...END

Before using taskFactoryMode of QPM, we need to prepare a TaskFactory class. We name it SpiderTaskFactory. The factory method of SpdierTaskFactory, fetchTask, returns the Runnable subclass instance normally. When an END or file ends, throw StopSignal will terminate the program.

The following is a code snippet that assembles and executes the Supervisor. For a complete example, see spider_task_factory.php.

// If no input is specified from the parser, use spider_task_factory_data.txt as the data source $ input = isset ($ argv [1])? $ Argv [1]: _ DIR __. '/spider_task_factory_data.txt'; $ spiderTaskFactory = new SpiderTaskFactory ($ input); $ config = [// specify taskFactory object and factory method 'factorymethod' => [$ spiderTaskFactory, 'fetchtask'], // specify the maximum number of concurrent jobs to be 3' quantity '=> 3,]; // start Supervisorqpm \ supervisor \ Supervisor: taskFactoryMode ($ config) -> start ();

The implementation of SpiderTaskFactory is as follows:

/*** The fetchTask method must be implemented for the job factory. * This method returns a normal result. **/class SpiderTaskFactory {private $ _ fh; public function _ construct ($ input) {$ this-> _ input = $ input; $ this-> _ fh = fopen ($ input, 'R'); if ($ this-> _ fh = false) {throw new Exception ('fopen failed: '. $ input) ;}} public function fetchTask () {while (true) {if (feof ($ this-> _ fh )) {throw new qpm \ supervisor \ StopSignal () ;}$ line = trim (fgets ($ this-> _ fh); if ($ line = 'end ') {throw new qpm \ supervisor \ StopSignal () ;}if (empty ($ line) {continue ;}break ;}return new SpiderTask ($ line );}}

The implementation of SpiderTask is as follows:

/*** The class for executing tasks in sub-Processes * must implement the qpm \ process \ Runnable interface */class SpiderTask implements qpm \ process \ Runnable {private $ _ target; public function _ construct ($ target) {$ this-> _ target = $ target;} // part of the public function run () executed in the sub-process () {$ r = @ file_get_contents ($ this-> _ target); if ($ r = false) {throw new Exception ('fail to crawl url :'. $ this-> _ target);} file_put_contents ($ this-> getLocalFilename (), $ r);} private func Tion getLocalFilename () {$ filename = str_replace ('/','~ ', $ This-> _ target); $ filename = str_replace (': ',' _ ', $ filename); $ filename = $ filename. '-'. date ('ymdhis '); return _ DIR __. '/_ spider/'.w.filename.'.html ';}}

In a real production environment, you can replace the input file with a queue to implement a producer/consumer model program that runs permanently.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.