Share the following PHP read the contents of several methods, different, and we encourage each other.
Example code 1: Get content with file_get_contents in get
<?php
$url = ' http://www.jbxue.com/';
$html =file_get_contents ($url);
Print_r ($http _response_header);
EC ($html);
Printhr ();
Printarr ($http _response_header);
Printhr ();
?>
Example code 2: Open URL with fopen, get content in get
<?
$FP =fopen ($url, ' R ');
Printarr (Stream_get_meta_data ($fp));
Printhr ();
while (!feof ($fp)) {
$result. =fgets ($FP, 1024);
}
echo "URL body: $result";
Printhr ();
Fclose ($FP);
?>
Example code 3: Get the URL by post using the File_get_contents function
<?php
$data =array (' foo ' = ' bar ');
$data =http_build_query ($data);
$opts =array (
' HTTP ' =>array (
' Method ' = ' POST ',
' Header ' = ' content-type:application/x-www-form-urlencoded\r\n '.
"Content-length:". strlen ($data). " \ r \ n ",
' Content ' = $data
),
);
$context =stream_context_create ($opts);
$html =file_get_contents (' http://localhost/e/admin/test.html ', false, $context);
echo$html;
?>
Articles you may be interested in:
- How to remove newline characters when PHP reads a file by line
- Example of PHP reading and processing large CSV files by line
- Code to read files by line (PHP, c implementation)
Example code 4: Open the URL with the Fsockopen function and get the full data in Get mode, including header and body
<?
Functionget_url ($url, $cookie =false) {
$url =parse_url ($url);
$query = $url [path]. "?". $url [query];
EC ("Query:". $query);
$FP =fsockopen ($url [host], $url [port]? $url [port]:80, $errno, $errstr, 30);
if (! $fp) {
Returnfalse;
}else{
$request = "get$queryhttp/1.1\r\n";
$request. = "Host: $url [host]\r\n";
$request. = "connection:close\r\n";
if ($cookie) $request. = "Cookie: $cookie \ n";
$request. = "\ r \ n";
Fwrite ($fp, $request);
while ([email protected] ($FP)) {
[Email protected]ets ($FP, 1024);
}
Fclose ($FP);
Return$result;
}
}
Gets the HTML part of the URL, removing the header
Functiongeturlhtml ($url, $cookie =false) {
$rowdata =get_url ($url, $cookie);
if ($rowdata)
{
$body =stristr ($rowdata, "\r\n\r\n");
$body =substr ($body, 4,strlen ($body));
Return$body;
}
Returnfalse;
}
?>
Example code 5: Open the URL with the Fsockopen function to get the full data in post, including the header and body
<?
Functionhttp_post ($URL, $data, $cookie, $referrer = "") {
Parsing the given URL
$URL _info=parse_url ($URL);
Building referrer
if ($referrer = = "")//if not given the use of this script. As referrer
$referrer = "111";
Making string from $data
foreach ($dataas $key=> $value)
$values []= "$key =". UrlEncode ($value);
$data _string=implode ("&", $values);
Find out which port was needed-if not given use standard (=80)
if (!isset ($URL _info["Port"))
$URL _info["Port"]=80;
Building Post-request:
$request. = "POST". $URL _info["path"]. " Http/1.1\n ";
$request. = "Host:". $URL _info["host"]. " \ n ";
$request. = "Referer: $referer \ n";
$request. = "content-type:application/x-www-form-urlencoded\n";
$request. = "Content-length:". strlen ($data _string). " \ n ";
$request. = "connection:close\n";
$request. = "Cookie: $cookie \ n";
$request. = "\ n";
$request. = $data _string. " \ n ";
$FP =fsockopen ($URL _info["host"], $URL _info["Port"]);
Fputs ($fp, $request);
while (!feof ($fp)) {
$result. =fgets ($FP, 1024);
}
Fclose ($FP);
Return$result;
}
Printhr ();
?>
Example code 6: Using the Curl Library, before using the Curl library, you might need to look at php.ini to see if the curl extension has been turned on
<?
$ch = Curl_init ();
$timeout = 5;
curl_setopt ($ch, Curlopt_url, ' http://www.jbxue.com/');
curl_setopt ($ch, Curlopt_returntransfer, 1);
curl_setopt ($ch, Curlopt_connecttimeout, $timeout);
$file _contents = curl_exec ($ch);
Curl_close ($ch);
echo $file _contents;
?>
About the Curl Library:
Curl Official website http://curl.haxx.se/
Curl is a routing file tool that uses URL syntax to support FTP, FTPS, HTTP htpps SCP SFTP TFTP, TELNET DICT file, and LDAP. Curl supports SSL certificates, HTTP POST, http PUT, FTP uploads, Kerberos, HTT-based uploads, proxies, cookies, user + password proofs, file transfer recovery, HTTP proxy channels, and a number of other useful tricks
<?
Functionprintarr (Array$arr)
{
echo "<br> Row field Count:". Count ($arr). " <br> ";
foreach ($arras $key=> $value)
{
echo "$key = $value <br>";
}
}
?>
PHP Crawl Remote Web site data Code
Now there may be many programs enthusiasts will encounter the same question, is how to crawl other people's Web site HTML code, like a search engine, and then collect the code into their own useful data! Let me introduce some simple examples today.
Ⅰ. Example of crawling a remote page title:
The following is a code snippet:
<?php
/*
+-------------------------------------------------------------
+ Crawl The page title code, copy this code snippet directly, save As. php file execution.
+-------------------------------------------------------------
*/
Error_reporting (7);
$file = fopen ("http://www.jbxue.com/", "R");
if (! $file) {
echo "<font color=red>unable to open remote file.</font>\n";
Exit
}
while (!feof ($file)) {
$line = fgets ($file, 1024x768);
if (Eregi ("<title> (. *) </title>", $line, $out)) {
$title = $out [1];
echo "". $title. "";
break;
}
}
fclose ($file);
End
?
Ⅱ. Examples of crawling Remote Web page HTML code:
Code snippet:
<? Php
/*
+----------------
+dnsing Sprider
+----------------
*/
$fp = Fsockopen ("www.dnsing.com", $errno, $errstr, 30);
if (! $fp) {
echo "$errstr ($errno) <br/>\n";
} else {
$out = "get/http/1.1\r\n";
$out. = "host:www.dnsing.com\r\n";
$out. = "Connection:close \r\n\r\n";
Fputs ($fp, $out);
while (!feof ($fp)) {
Echo fgets ($FP, 128);
}
Fclose ($FP);
}
End
?>
The above two code snippets are directly copied back to run to know the effect, the above example is just crawling the embryonic form of web data, to make it more suitable for their own use, the situation is different. So, here are the program enthusiasts themselves to study it.
======
The functions are: Get_content_by_socket (), Get_url (), Get_content_url (), get_content_object several functions, and may be able to give you something to think about.
<?php
Get all content URLs saved to file
function Get_index ($save _file, $prefix = "Index_") {
$count = 68;
$i = 1;
if (file_exists ($save _file)) @unlink ($save _file);
$fp = fopen ($save _file, "A +") or Die ("Open". $save _file. "Failed");
while ($i < $count) {
$url = $prefix. $i. ". HTM ";
echo "Get". $url. " ...";
$url _str = Get_content_url (Get_url ($url));
echo "ok\n";
Fwrite ($fp, $url _str);
+ + $i;
}
Fclose ($FP);
}
Get Target Multimedia Object
function Get_object ($url _file, $save _file, $split = "|--:* *:--|") {
if (!file_exists ($url _file)) die ($url _file, "not exist");
$file _arr = file ($url _file);
if (!is_array ($file _arr) | | empty ($file _arr)) die ($url _file, "not content");
$url _arr = Array_unique ($file _arr);
if (file_exists ($save _file)) @unlink ($save _file);
$fp = fopen ($save _file, "A +") or Die ("Open save File". $save _file. "Failed");
foreach ($url _arr as $url) {
if (empty ($url)) continue;
echo "Get". $url. " ...";
$html _str = Get_url ($url);
echo $html _str;
echo $url;
Exit
$obj _str = get_content_object ($html _str);
echo "ok\n";
Fwrite ($fp, $obj _str);
}
Fclose ($FP);
}
Traverse directory to get file contents
function Get_dir ($save _file, $dir) {
$DP = Opendir ($dir);
if (file_exists ($save _file)) @unlink ($save _file);
$fp = fopen ($save _file, "A +") or Die ("Open save File". $save _file. "Failed");
while (($file = Readdir ($DP)) = = False) {
if ($file! = "." && $file! = "...") {
echo "Read file". $file. " ...";
$file _content = file_get_contents ($dir. $file);
$obj _str = get_content_object ($file _content);
echo "ok\n";
Fwrite ($fp, $obj _str);
}
}
Fclose ($FP);
}
Gets the specified URL content
function Get_url ($url) {
$reg = '/^http:\/\/[^\/].+$/';
if (!preg_match ($reg, $url)) Die ($url. "Invalid");
$fp = fopen ($url, "R") or Die ("Open URL:". $url. "Failed.");
while ($FC = Fread ($fp, 8192)) {
$content. = $FC;
}
Fclose ($FP);
if (empty ($content)) {
Die ("Get URL:".) $url. "Content failed.");
}
return $content;
}
Get the specified page using the socket
function Get_content_by_socket ($url, $host) {
$fp = Fsockopen ($host, or Die ("Open"). $url. "Failed");
$header = "GET/". $url. " Http/1.1\r\n ";
$header. = "Accept: */*\r\n";
$header. = "accept-language:zh-cn\r\n";
$header. = "Accept-encoding:gzip, deflate\r\n";
$header. = "user-agent:mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; Maxthon; infopath.1;. NET CLR 2.0.50727) \ r \ n ";
$header. = "Host:". $host. " \ r \ n ";
$header. = "connection:keep-alive\r\n";
$header. = "cookie:cnzz02=2; rtime=1; ltime=1148456424859; Cnzz_eid=56601755-\r\n\r\n ";
$header. = "connection:close\r\n\r\n";
Fwrite ($fp, $header);
while (!feof ($fp)) {
$contents. = Fgets ($fp, 8192);
}
Fclose ($FP);
return $contents;
}
Gets the URL in the specified content
function Get_content_url ($host _url, $file _contents) {
$reg = '/^ (#|javascript.*?| ftp:\/\/.+|http:\/\/.+|. *?href.*?| play.*?| index.*?|. *?asp) +$/i ';
$reg = '/^ (down.*?\.html|\d+_\d+\.htm.*?) $/i ';
$rex = "/([HH][RR][EE][FF]) \s*=\s*[' \"]* ([^> ' \ "\s]+") [\ ' ' >]*\s*/i ';
$reg = '/^ (down.*?\.html) $/i ';
Preg_match_all ($rex, $file _contents, $r);
$result = ""; Array ();
foreach ($r as $c) {
if (Is_array ($c)) {
foreach ($c as $d) {
if (Preg_match ($reg, $d)) {$result. = $host _url. $d. " \ n "; }
}
}
}
return $result;
}
Gets the multimedia file in the specified content
function Get_content_object ($str, $split = "|--:* *:--|") {
$REGX = "/href\s*=\s*[" \ "]* ([^> ' \" \s]+) [\ "' >]*\s* (<b>.*?<\/b>)/I";
Preg_match_all ($REGX, $str, $result);
if (count ($result) = = 3) {
$result [2] = Str_replace ("<b> Multimedia:", "", $result [2]);
$result [2] = Str_replace ("</b>", "", $result [2]);
$result = $result [1][0]. $split. $result [2][0]. "\ n";
}
return $result;
}
?>