Crawl remote content, has been using the file_get_content function, in fact, already know that there is a curl such a good thing exists, but after a look at the feeling of use is quite complicated, no file_get_content so simple, then the demand is not big, So no learning to use curl.
Until recently, to do a web thief program to find that File_get_content has been completely unable to meet the demand. I think, in reading remote content, file_get_content in addition to use than curl convenient, other than curl good.
Some comparisons of curl and file_get_content in PHP
Main differences:
Learn to find that Curl supports many protocols, with FTP, FTPS, HTTP, HTTPS, GOPHER, TELNET, DICT, file, and LDAP, which means that it can do a lot of things file_get_content can't do. Curl in PHP can achieve remote acquisition and collection of content, PHP Web version of FTP upload download, the implementation of analog landing, interface docking (API), data transmission, implementation of analog cookies, download file breakpoints, etc., the function is very powerful.
Understand curl Some basic use, only to find that it is not difficult, just remember some of the setting parameters, difficult to get a little, but we remember a few commonly used on it.
Open Curl:
Because PHP is not supported by default Curl function, so if you want to use curl, first need to open the function in the php.ini, that is, remove; extension= php_curl.dll before the semicolon, and then save to restart the Apache/iis is good.
Basic syntax:
Copy Code code as follows:
$my _curl = Curl_init (); Initializes a Curl object
curl_setopt ($my _curl, Curlopt_url, "http://www.jb51.net"); Set the URL you want to crawl
curl_setopt ($my _curl,curlopt_returntransfer,1); Set whether to save the result to a string or to the screen, 1 to save the result to a string
$str = curl_exec ($curl); Execute request
Echo $str; Output Crawl Results
Curl_close ($curl); Close URL Request
Recently you need to get the music data on someone's website. The File_get_contents function is used, but there is always a problem of getting a failure, although a timeout is set by the example in the manual, but most of the time it will not work:
$config [' context '] = stream_context_create (Array (' HTTP ' => Array (') ' => ' get ',
' Timeout ' => 5//This timeout period is unstable and often does not work.
)
));
At this point, look at the connection pool of the server, you will find a bunch of similar errors, I have a headache:
File_get_contents (http://***): failed to open stream ...
Now instead of the Curl library, write a function substitution:
function Curl_file_get_contents ($durl) {
$ch = Curl_init ();
curl_setopt ($ch, Curlopt_url, $durl);
curl_setopt ($ch, Curlopt_timeout, 5);
curl_setopt ($ch, curlopt_useragent, _useragent_);
curl_setopt ($ch, Curlopt_referer,_referer_);
curl_setopt ($ch, Curlopt_returntransfer, 1);
$r = curl_exec ($ch);
Curl_close ($ch);
return $r;
}
So, in addition to the real network problems, there is no more problems.
This is the test that someone else has done about curl and file_get_contents:
File_get_contents crawl google.com Required seconds:
2.31319094
2.30374217
2.21512604
3.30553889
2.30124092
Curl Time to use:
0.68719101
0.64675593
0.64326
0.81983113
0.63956594
A big gap? Well, from the experience I used, these two tools are not only the speed difference, the stability is also very big difference.
It is recommended that the network data capture stability requirements of high friends use the above curl_file_get_contents function, not only stable speed, but also fake browser spoofing target address Oh!
Method 1: Get content with file_get_contents
Copy Code code as follows:
<?php
$url = ' http://www.domain.com/';
$html = file_get_contents ($url);
Echo $html;
?>
Method 2: Open the URL with fopen and get the content
Copy Code code as follows:
<?php
$fp = fopen ($url, ' R ');
Stream_get_meta_data ($FP);
while (!feof ($fp)) {
$result. = Fgets ($fp, 1024);
}
echo "URL body: $result";
Fclose ($FP);
?>
Method 3: Use the file_get_contents function to get the URL by post
Copy Code code as follows:
<?php
$data = Array (' foo ' => ' Bar ');
$data = Http_build_query ($data);
$opts = Array (
' http ' => array (
' Method ' => ' POST ',
' Header ' => ' content-type:application/x-www-form-urlencodedrn '.
"Content-length:". Strlen ($data). "RN",
' Content ' => $data
)
);
$context = Stream_context_create ($opts);
$html = file_get_contents (' http://localhost/e/admin/test.html ', false, $context);
Echo $html;
?>
Method 4: Open the URL with the Fsockopen function to get the complete data, including header and body
Copy Code code as follows:
<?php
function Get_url ($url, $cookie =false)
{
$url = Parse_url ($url);
$query = $url [path]. $url [query];
echo "Query:". $query;
$fp = Fsockopen ($url [host], $url [port]? $url [port]:80, $errno, $errstr, 30);
if (! $fp) {
return false;
} else {
$request = "Get $query http/1.1rn";
$request. = "Host: $url [Host]rn";
$request. = "Connection:closern";
if ($cookie) $request. = "Cookie: $cookien";
$request. = "RN";
Fwrite ($fp, $request);
while ()) {
$result. = @fgets ($fp, 1024);
}
Fclose ($FP);
return $result;
}
}
Gets the HTML portion of the URL and removes the header
function geturlhtml ($url, $cookie =false)
{
$rowdata = Get_url ($url, $cookie);
if ($rowdata)
{
$body = Stristr ($rowdata, "rnrn");
$body =substr ($body, 4,strlen ($body));
return $body;
}
return false;
}
?>
Method 5: Open the URL with the Fsockopen function to get the complete data in post, including header and body
Copy Code code as follows:
<?php
function Http_post ($URL, $data, $cookie, $referrer = "")
{
Parsing the given URL
$URL _info=parse_url ($URL);
Building referrer
if ($referrer = = "")//if not given use this script as referrer
$referrer = "111″;
Making string from $data
foreach ($data as $key => $value)
$values []= "$key =" UrlEncode ($value);
$data _string=implode ("&", $values);
Find out which the port is needed–if to given use standard (=80)
if (!isset ($URL _info["Port"))
$URL _info["Port"]=80;
Building Post-request:
$request. = "POST". $URL _info["path"]. " HTTP/1.1N ";
$request. = "Host:". $URL _info["host"]. " n ";
$request. = "Referer: $referern";
$request. = "Content-type:application/x-www-form-urlencodedn";
$request. = "Content-length:". strlen ($data _string). " n ";
$request. = "Connection:closen";
$request. = "Cookie: $cookien";
$request. = "n";
$request. = $data _string. " n ";
$fp = Fsockopen ($URL _info["host"), $URL _info["Port");
Fputs ($fp, $request);
while (!feof ($fp)) {
$result. = Fgets ($fp, 1024);
}
Fclose ($FP);
return $result;
}
?>
Method 6: Use the Curl Library, before using the Curl library, you might want to see if PHP.ini has opened the curl extension
Copy Code code as follows:
<?php
$ch = Curl_init ();
$timeout = 5;
curl_setopt ($ch, Curlopt_url, ' http://www.domain.com/');
curl_setopt ($ch, Curlopt_returntransfer, 1);
curl_setopt ($ch, Curlopt_connecttimeout, $timeout);
$file _contents = curl_exec ($ch);
Curl_close ($ch);
echo $file _contents;
?>
PHP curl, Fsockopen, file_get_contents Three functions can be implemented to capture analog speech. What is the difference between the three, or the exquisite?
Zhao Yongbin:
Sometimes using file_get_contents () to invoke external files, it is easy to exceed the times error. You can replace it with curl. The exact reason is not clear
Curl efficiency is higher than file_get_contents () and fsockopen () because Curl will automatically cache DNS information (highlight AH I have to be close to test)
Fan Jiapeng:
File_get_contents Curl Fsockopen
Selective operation in the currently requested environment, no generalizations:
With our company development KBI application to see:
Just start with: file_get_contents
Later adopted: Fsockopen
Finally to date adopted: curl
(remote) I personally understand that the statement is as follows (no please point out, not in place please add)
File_get_contents need to open the Allow_url_fopen in php.ini, request HTTP, use is Http_fopen_wrapper, not Keeplive.curl is OK.
File_get_contents () A single execution is efficient, returning information that has no headers.
This is not a problem when reading a generic file, but there is a problem when reading a remote problem.
If you want to make a continuous connection, request multiple pages more than once. Then file_get_contents and fopen will have problems.
The contents may also be incorrect. So when you do some sort of collection work, there's definitely a problem.
Sock is lower than the bottom, configuration trouble, difficult to operate. Returns the complete information.
Pan Shaoning-Tencent:
File_get_contents can get the content of a URL, but it can't post gets.
Curl can post and get. You can also information
And the socket is even lower. Can be set to interact based on UDP or TCP protocol
File_get_contents and curl capable, sockets are capable.
The socket is capable, and the curl is not necessarily capable.
File_get_contents more time just to pull the data. High efficiency is also relatively simple.
Zhao's situation This I also met, I through Curl set host on OK. This is related to the network environment
Copy Code code as follows:
<?php
/**
* Socket version
* Use method:
* $post _string = "App=socket&version=beta";
* Request_by_socket (' jb51.net ', '/restserver.php ', $post _string);
*/
function Request_by_socket ($remote _server, $remote _path, $post _string, $port =, $timeout = 30) {
$socket = Fsockopen ($remote _server, $port, $errno, $errstr, $timeout);
if (! $socket) Die ("$errstr ($errno)");
Fwrite ($socket, "POST $remote _path http/1.0");
Fwrite ($socket, "User-agent:socket Example");
Fwrite ($socket, "HOST: $remote _server");
Fwrite ($socket, "content-type:application/x-www-form-urlencoded");
Fwrite ($socket, "Content-length:". strlen ($post _string) +8. "");
Fwrite ($socket, "accept:*/*");
Fwrite ($socket, "");
Fwrite ($socket, "mypost= $post _string");
Fwrite ($socket, "");
$header = "";
while ($str = Trim (fgets ($socket, 4096))) {
$header. = $str;
}
$data = "";
while (!feof ($socket)) {
$data. = Fgets ($socket, 4096);
}
return $data;
}
/**
* Curl Version
* Use method:
* $post _string = "App=request&version=beta";
* Request_by_curl (' http://jb51.net/restServer.php ', $post _string);
*/
function Request_by_curl ($remote _server, $post _string) {
$ch = Curl_init ();
curl_setopt ($ch, Curlopt_url, $remote _server);
curl_setopt ($ch, Curlopt_postfields, ' mypost= '. $post _string);
curl_setopt ($ch, curlopt_returntransfer,true);
curl_setopt ($ch, Curlopt_useragent, "Jimmy ' s Curl Example beta");
$data = curl_exec ($ch);
Curl_close ($ch);
return $data;
}
/**
* Other versions
* Use method:
* $post _string = "App=request&version=beta";
* Request_by_other (' http://jb51.net/restServer.php ', $post _string);
*/
function Request_by_other ($remote _server, $post _string) {
$context = Array (
' HTTP ' =>array (
' Method ' => ' POST ',
' Header ' => ' content-type:application/x-www-form-urlencoded '.
' User-agent:jimmy ' s POST Example beta '.
' Content-length: strlen ($post _string) +8,
' Content ' => ' mypost= '. $post _string)
);
$stream _context = stream_context_create ($context);
$data = file_get_contents ($remote _server,false, $stream _context);
return $data;
}
?>