PHP crawl pages in several ways _php tutorial

Source: Internet
Author: User
Tags fread
When we develop the network program, we often need to crawl non-local files, in general, the use of PHP simulation browser access, HTTP requests to access the URL address, and then get the HTML source code or XML data, we can not directly output data, often need to extract the content, Then format it and show it in a more friendly way.

Here are some simple ways and principles of PHP crawl page:


First, the main method of PHP crawl page:

1. File () function

2. file_get_contents () function

3. fopen ()->fread ()->fclose () mode

4.curl mode

5. Fsockopen () function socket mode

6. Using plug-ins (e.g.: http://sourceforge.net/projects/snoopy/)


Second, PHP parsing HTML or XML code the main way:

1. File () function

?123456789 //定义url$url='http://t.qq.com'; //fiel函数读取内容数组$lines_array=file($url); //拆分数组为字符串 $lines_string=implode('',$lines_array); //输出内容,嘿嘿,大家也可以保存在自己的服务器上echo $lines_string; 

2. file_get_contents () function
Use file_get_contents and fopen to open allow_url_fopen. Method: Edit PHP.ini, set allow_url_fopen = On,allow_url_fopen Close when fopen and file_get_contents cannot open remote files.

?1234567 //定义url $url='http://t.qq.com'; //file_get_contents函数远程读取数据$lines_string=file_get_contents($url); //输出内容,嘿嘿,大家也可以保存在自己的服务器上 echo htmlspecialchars($lines_string);

3. fopen ()->fread ()->fclose () mode

?12345678910111213141516171819 //定义url$url='http://t.qq.com'; //fopen以二进制方式打开 $handle=fopen($url,"rb");//变量初始化$lines_string="";//循环读取数据do{ $data=fread($handle,1024); if(strlen($data)==0) { break; } $lines_string.=$data; }while(true);//关闭fopen句柄,释放资源fclose($handle); //输出内容,嘿嘿,大家也可以保存在自己的服务器上echo $lines_string;

4. Curl Mode
Use curl to have space to turn on curl. Method: Modify PHP.ini under WINDOWS, remove the semicolon in front of Extension=php_curl.dll, and need to copy Ssleay32.dll and Libeay32.dll to C:\WINDOWS\system32 ; Install the curl extension under Linux.

?123456789101112131415 // 创建一个新cURL资源$url='http://t.qq.com'; $ch=curl_init(); $timeout=5; // 设置URL和相应的选项curl_setopt($ch, CURLOPT_URL, $url);curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);// 抓取URL$lines_string=curl_exec($ch); // 关闭cURL资源,并且释放系统资源curl_close($ch);//输出内容,嘿嘿,大家也可以保存在自己的服务器上echo $lines_string;

5. Fsockopen () function socket mode
The socket mode can be executed correctly, and it is related to the server settings, which can be phpinfo to see which communication protocols are open by the server.

?1234567891011121314 $fp = fsockopen("t.qq.com", 80, $errno, $errstr, 30);if (!$fp) { echo "$errstr ($errno)
\n"
;} else { $out = "GET / HTTP/1.1\r\n"; $out .= "Host: t.qq.com\r\n"; $out .= "Connection: Close\r\n\r\n"; fwrite($fp, $out); while (!feof($fp)) { echo fgets($fp, 128); } fclose($fp);}

6. Snoopy plug-in, the latest version is Snoopy-1.2.4.zip last update:2013-05-30, recommend everyone to use

It is a very powerful collection plug-in using the very popular Snoopy on the internet, and it is very convenient to use, and you can also set up an agent inside to simulate browser information.

?123456789101112 //引入snoopy的类文件require('Snoopy.class.php');//初始化snoopy类$snoopy = new Snoopy;$url = "http://t.qq.com";//开始采集内容$snoopy->fetch($url); //保存采集内容到$lines_string$lines_string = $snoopy->results;//输出内容,嘿嘿,大家也可以保存在自己的服务器上 echo $lines_string;

Description: The setup agent is on line 45th of the Snoopy.class.php file, where you search for "var $agent" (The contents of the quotation marks). Browser content you can use PHP to get,
Use echo $_server[' http_user_agent ']; You can get the browser information and copy the echo out into the agent.

http://www.bkjia.com/PHPjc/735061.html www.bkjia.com true http://www.bkjia.com/PHPjc/735061.html techarticle when we develop the network program, we often need to crawl non-local files, in general, the use of PHP simulation browser access, HTTP requests to access the URL address, and then get the HTML source generation ...

  • Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.