PHP Multi-process programming (3): Multi-Process crawl Web page Demo

Source: Internet
Author: User

We know that data passing from parent to child is relatively easy, but it is more difficult to pass from a child process to a parent process.

There are many ways to achieve process interaction, and in PHP it is more convenient to communicate in the pipeline. Of course, communication can also be done through Socket_pair.

The first is the server in order to respond to every request to do things (send a URL sequence, url sequence with t split. And the end tag is N)

function Clienthandle ($msgsock, $obj) {$nbuf="';    Socket_set_block ($msgsock);  Do {        if(false= = = ($buf = @socket_read ($msgsock,2048, Php_normal_read))) {$obj->error ("Socket_read () Failed:reason:". Socket_strerror (Socket_last_error ($msgsock)));  Break; } $nbuf.=$buf; if(Substr ($nbuf,-1) !="\ n") {            Continue; } $nbuf=trim ($NBUF); if($nbuf = ='quit') {             Break; }        if($nbuf = ='shutdown') {             Break; } $url= Explode ("\ t", $NBUF); $nbuf="'; $talkback=Serialize (Read_ntitle ($url));        Socket_write ($msgsock, $talkback, strlen ($talkback)); Debug ("write to the client\n");  Break; }  while(true);}

one of the key parts of the above code is Read_ntitle, a function that implements multi-threaded read headers.

The code is as follows: (fork a thread for each URL, then open the pipe, read the caption to the pipeline, and the main thread reads the pipe data until all the data has been read, and finally the pipeline is deleted)

function Read_ntitle ($arr) {$pipe=NewPipe ("Multi-read"); foreach($arr as$k =$item) {$pids [$k]=pcntl_fork (); if(!$pids [$k]) {$pipe-Open_write (); $pid=Posix_getpid (); $content=Base64_encode (Read_title ($item)); $pipe->write ("$k, $content \ n"); $pipe-Close_write (); Debug ("$k: Write success!\n");        Exit }} debug ("Read begin!\n"); $data= $pipeRead_all (); Debug ("Read end!\n"); $pipe-rm_pipe ();returnParse_data ($data);} Parse_data code below, very simple, do not say. Parse_data code below, very simple, do not say. function Parse_data ($data) {$data= Explode ("\ n", $data); $New=Array (); foreach($data as$value) {$value= Explode (",", $value); if(count ($value) = =2) {$value [1] = Base64_decode ($value [1]); $New[Intval ($value [0])] = $value [1]; }} ksort ($New, Sort_numeric); return$New;}

in the above code, there is also a function read_title more skillful. For compatibility, I didn't use curl, but I used socket communication directly.

After downloading to the title tag, stop reading the content to save time. The code is as follows:

function Read_title ($url) {$url _info=Parse_url ($url); if(!isset ($url _info['Host']) || !isset ($url _info['Scheme'])) {     return false; } $host= $url _info['Host']; $port= Isset ($url _info['Port']) ? $url _info['Port'] :NULL; $path= Isset ($url _info['Path']) ? $url _info['Path']  :"/";if(Isset ($url _info['Query'])) $path. ="?". $url _info['Query'];if(Empty ($port)) {$port= the;}if($url _info['Scheme'] =='HTTPS') {$port=443;}if($url _info['Scheme'] =='http') {$port= the;} $ out="GET $path http/1.1\r\n"; $ out.="Host: $host \ r \ n"; $ out.="user-agent:mozilla/5.0 (Windows; U Windows NT 5.1; ZH-CN; rv:1.9.1.7) \ r \ n"; $ out.="connection:close\r\n\r\n"; $FP= Fsockopen ($host, $port, $errno, $errstr,5); if($fp = =NULL) {Error ("get title from $url, error. $errno: $ERRSTR \ n"); return false; } fwrite ($FP, $ out); $content="';  while(!feof ($FP)) {$content.= Fgets ($FP,1024x768); if(Preg_match ("/<title> (. *?) <\/title>/is", $content, $matches))            {fclose ($FP); returnEncode_to_utf8 ($matches [1]);    }} fclose ($FP); return false;} function Encode_to_utf8 ($string){     returnMb_convert_encoding ($string,"UTF-8", Mb_detect_encoding ($string,"UTF-8, GB2312, Iso-8859-1",true));}

Here, I'm just checking out three of the most common encodings. The other code is very simple, the code is testing, if you want to do such a server, it must be optimized processing. In particular, to prevent too many processes from opening at once, you need to do more processing.

Many times, we complain that PHP does not support multiple processes, and in fact, PHP supports multiple processes. Of course, there are not so many options for process communication, and the core of many processes is the process of communication and synchronization. In web development, such multithreading is basically not used because there are serious performance issues. To achieve a relatively simple multi-process, high load, it must be extended.

PHP Multi-process programming (3): Multi-Process crawl Web page Demo

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.