PHP multi-process programming (3) multi-process web page capture demonstration

Last Update:2017-05-14 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

PHP multi-process programming (3) demonstration of multi-process crawling web pages to understand this part of code, please read:

PHP multi-process programming (1)

PHP multi-process programming (2) pipeline communication

We know that it is easier to transmit data from the parent process to the child process, but it is more difficult to transmit data from the child process to the parent process.

There are many ways to achieve process interaction. in php, pipeline communication is more convenient. Of course, you can also use socket_pair for communication.

The first is what the server needs to do in response to each request (send a url sequence, and the url sequence is separated by t. The end mark is n)

function clientHandle($msgsock, $obj){    $nbuf = '';    socket_set_block($msgsock);    do {        if (false === ($buf = @socket_read($msgsock, 2048, PHP_NORMAL_READ))) {            $obj->error("socket_read() failed: reason: " . socket_strerror(socket_last_error($msgsock)));            break;        }        $nbuf .= $buf;        if (substr($nbuf, -1) != "\n") {            continue;        }        $nbuf = trim($nbuf);        if ($nbuf == 'quit') {            break;        }        if ($nbuf == 'shutdown') {            break;        }        $url = explode("\t", $nbuf);        $nbuf = '';        $talkback = serialize(read_ntitle($url));        socket_write($msgsock, $talkback, strlen($talkback));        debug("write to the client\n");        break;    } while (true);}

The key part of the above code is read_ntitle. this function implements multi-threaded Title reading.

The code is as follows: (for each url fork, a thread opens the pipeline, and the title read is written into the pipeline. the main thread keeps reading the pipeline data until all the data is read, delete the MPs queue)

Function read_ntitle ($ arr) {$ pipe = new Pipe ("multi-read"); foreach ($ arr as $ k => $ item) {$ pids [$ k] = pcntl_fork (); if (! $ Pids [$ k]) {$ pipe-> open_write (); $ pid = posix_getpid (); $ content = base64_encode (read_title ($ item )); $ pipe-> write ("$ k, $ content \ n"); $ pipe-> close_write (); debug ("$ k: write success! \ N "); exit ;}} debug (" read begin! \ N "); $ data = $ pipe-> read_all (); debug (" read end! \ N "); $ pipe-> rm_pipe (); return parse_data ($ data);} the code for parse_data is as follows, which is very simple and will not be mentioned. The code for parse_data is as follows, which is very simple. Function parse_data ($ data) {$ data = explode ("\ n", $ data); $ new = array (); foreach ($ data as $ value) {$ value = explode (",", $ value); if (count ($ value) = 2) {$ value [1] = base64_decode ($ value [1]); $ new [intval ($ value [0])] = $ value [1];} ksort ($ new, SORT_NUMERIC); return $ new ;}

In the above code, there is also a function read_title that is more skillful. For compatibility, I did not use curl, but directly used socket communication.

After downloading the title tag, stop reading the content to save time. The code is as follows:

Function read_title ($ url) {$ url_info = parse_url ($ url); if (! Isset ($ url_info ['host']) |! Isset ($ url_info ['scheme ']) {return false;} $ host = $ url_info ['host']; $ port = isset ($ url_info ['port'])? $ Url_info ['port']: null; $ path = isset ($ url_info ['path'])? $ Url_info ['path']: "/"; if (isset ($ url_info ['query']) $ path. = "? ". $ Url_info ['query']; if (empty ($ port) {$ port = 80;} if ($ url_info ['scheme '] = 'https ') {$ port = 443;} if ($ url_info ['scheme '] = 'http') {$ port = 80 ;} $ out = "GET $ path HTTP/1.1 \ r \ n"; $ out. = "Host: $ host \ r \ n"; $ out. = "User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv: 1.9.1.7) \ r \ n"; $ out. = "Connection: Close \ r \ n"; $ fp = fsockopen ($ host, $ port, $ errno, $ errstr, 5); if ($ fp = = NULL) {error ("get title from $ url, error. $ errno: $ errstr \ n "); return false;} fwrite ($ fp, $ out); $ content =''; while (! Feof ($ fp) {$ content. = fgets ($ fp, 1024); if (preg_match ("/(.*?) <\/Title>/is ", $ content, $ matches) {fclose ($ fp); return encode_to_utf8 ($ matches [1]);} fclose ($ fp); return false;} function encode_to_utf8 ($ string) {return mb_convert_encoding ($ string, "UTF-8", mb_detect_encoding ($ string, "UTF-8, GB2312, ISO-8859-1 ", true) ;}</pre>Here, I only detected the three most common encodings. Other codes are simple. these codes are used for testing. if you want to use such a server, you must optimize them. In particular, you need to do more to prevent too many processes from being opened at a time.
   Many times, we complain that php does not support multiple processes. In fact, php supports multiple processes. Of course, there are not so many options for process communication, and the core of multi-process lies in the communication and synchronization of processes. In web development, such multithreading is basically not used because of serious performance problems. To implement simple multi-process and high load, you must use its extension.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

PHP multi-process programming (3) multi-process web page capture demonstration

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

PHP multi-process programming (3) multi-process web page capture demonstration

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support