Php web page collection and receiving program code-PHP source code

Source: Internet
Author: User
Web page collection is now used most of the tools, like the most popular webmaster is the locomotive, but some webmasters like to use web page custom collection, next let's take a look at a php Web page collection program code web page collection which is now used most of the tools, like the most popular webmaster is the locomotive, however, some webmasters prefer to use web page custom collection. Let's take a look at a php Web page collection and receiving program code.

Script ec (2); script

Php web page Collection Program Summary, recently made a collection program for friends

With www.xxxx.com/shop_list.php? Page = 1 & province = % B1 % B1 % BE % A9

% B1 % B1 % BE % A9 is a gb2312 transcoding task, for example

$ Aa = "Beijing ";
$ Aa = @ iconv ("UTF-8", "gb2312", $ aa );
Echo $ bb = urlencode ($ aa );

We can use file_get_contents ($ url) to capture webpages. Of course, it can also be curl.

Function getHtml ($ url ){
$ Ch2 = curl_init ($ url );
Curl_setopt ($ ch2, CURLOPT_RETURNTRANSFER, 1 );
$ Html = curl_exec ($ ch2 );
Curl_close ($ ch2 );
Return $ html;
}

Capture the page data we want, you can set the range from which location to which location, take out the intermediate data, through the following methods to achieve

Function findneed ($ wholestr, $ strkey1, $ strkey2)
{
$ Num1 = strpos ($ wholestr, $ strkey1) + strlen ($ strkey1 );
$ Num2 = strpos ($ wholestr, $ strkey2 );
$ Needstr = substr ($ wholestr, $ num1, $ num2-$ num1 );
Return $ needstr;
}
Of course, this is a method. We only need to write a php file and capture it by page. However, if it is all placed in the loop, it is not very slow.

We will introduce another algorithm.

Script
Location. href = "index. php? Page = & Amp; provinceIndex = & TotalPage = ";
Script

The page array of the current url is saved to the database by redirecting pages, capturing, accessing programs, and continuously redirecting pages.

Others are nothing more than regular expressions:

For example, if we want to retrieve all the cities on the page

Preg_match_all ('/(.*?) <\/Select>/s', $ html, $ selects );(.*?) Represents any character. Is anything * 0 to infinite? 0 to 1Another algorithm is recursion, similar to loop values.Function collectionProvinceData ($ url, $ province, $ page = 1, $ totalPage =-1 ){If ($ page> $ totalPage & $ totalPage>-1 ){Return false;}$ CollectionUrl = $ url ."? Page = ". $ page." & province = ". urlencode (iconv ('utf-8', 'gb2312 ', $ province ));Echo "current url:". $ province. "page url of {$ page}". $ collectionUrl ."";$ Html = getHtml ($ collectionUrl );$ Html = mb_convert_encoding ($ html, 'utf-8', 'utf-8, GBK, GB2312, big5 ');If ($ totalPage =-1 ){$ LatestPageNum = getLatestPageNum ($ html );If ($ latestPageNum> 0 ){$ TotalPage = $ latestPageNum;}}$ DataRows = getDataRows ($ html );SaveDataRowsOrNot ($ dataRows );If (! Empty ($ dataRows )){$ Page ++;}Ob_flush ();Flush ();CollectionProvinceData ($ url, $ province, $ page, $ totalPage );}

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.