Web page collection is now used most of the tools, like the most popular webmaster is the locomotive, but some webmasters like to use web page custom collection, next let's take a look at a php Web page collection program code web page collection which is now used most of the tools, like the most popular webmaster is the locomotive, however, some webmasters prefer to use web page custom collection. Let's take a look at a php Web page collection and receiving program code.
Script ec (2); script
Php web page Collection Program Summary, recently made a collection program for friends
With www.xxxx.com/shop_list.php? Page = 1 & province = % B1 % B1 % BE % A9
% B1 % B1 % BE % A9 is a gb2312 transcoding task, for example
$ Aa = "Beijing ";
$ Aa = @ iconv ("UTF-8", "gb2312", $ aa );
Echo $ bb = urlencode ($ aa );
We can use file_get_contents ($ url) to capture webpages. Of course, it can also be curl.
Function getHtml ($ url ){
$ Ch2 = curl_init ($ url );
Curl_setopt ($ ch2, CURLOPT_RETURNTRANSFER, 1 );
$ Html = curl_exec ($ ch2 );
Curl_close ($ ch2 );
Return $ html;
}
Capture the page data we want, you can set the range from which location to which location, take out the intermediate data, through the following methods to achieve
Function findneed ($ wholestr, $ strkey1, $ strkey2)
{
$ Num1 = strpos ($ wholestr, $ strkey1) + strlen ($ strkey1 );
$ Num2 = strpos ($ wholestr, $ strkey2 );
$ Needstr = substr ($ wholestr, $ num1, $ num2-$ num1 );
Return $ needstr;
}
Of course, this is a method. We only need to write a php file and capture it by page. However, if it is all placed in the loop, it is not very slow.
We will introduce another algorithm.
Script
Location. href = "index. php? Page = & Amp; provinceIndex = & TotalPage = ";
Script
The page array of the current url is saved to the database by redirecting pages, capturing, accessing programs, and continuously redirecting pages.
Others are nothing more than regular expressions:
For example, if we want to retrieve all the cities on the page
Preg_match_all ('/(.*?) <\/Select>/s', $ html, $ selects );(.*?) Represents any character. Is anything * 0 to infinite? 0 to 1Another algorithm is recursion, similar to loop values.Function collectionProvinceData ($ url, $ province, $ page = 1, $ totalPage =-1 ){If ($ page> $ totalPage & $ totalPage>-1 ){Return false;}$ CollectionUrl = $ url ."? Page = ". $ page." & province = ". urlencode (iconv ('utf-8', 'gb2312 ', $ province ));Echo "current url:". $ province. "page url of {$ page}". $ collectionUrl ."";$ Html = getHtml ($ collectionUrl );$ Html = mb_convert_encoding ($ html, 'utf-8', 'utf-8, GBK, GB2312, big5 ');If ($ totalPage =-1 ){$ LatestPageNum = getLatestPageNum ($ html );If ($ latestPageNum> 0 ){$ TotalPage = $ latestPageNum;}}$ DataRows = getDataRows ($ html );SaveDataRowsOrNot ($ dataRows );If (! Empty ($ dataRows )){$ Page ++;}Ob_flush ();Flush ();CollectionProvinceData ($ url, $ province, $ page, $ totalPage );}