Document directory
- Features
- Options
- Examples
I recently had to develop a small script that will
Fetch an XML file from the Web. All I had to do is download a given URL
And read its contents. To my great surprise I found that download
File using my JX Ajax Library
Was much easier than doing it with PHP.
PHP make this very easy by including functions like
File_get_contents () that has URL support. This code will get you
Contents of an URL.
$contents = file_get_contents('http://example.com/rss.xml');
Unfortunately, this is a huge security threat-and receive servers
Have disabled this feature in PHP. Also this is not the most optimized
Method To fetch an URL. Also, it is impossible to submit data using
POST method using this function.
Other options-curl and fsockopen
PHP provide other two method to fetch an URL-curl
And fsockopen
. But to use this I have to write a lot more code.
Load ()
So I decided to create my own function that makes it much more easier.
Features
- Easy to use.
- Supports get and post methods.
- Supports HTTP Basic Authentication-this will work-http: // binary: password@example.com/
- Supports both curl and fsockopen. tries to use curl-if it is not available, users fsockopen.
- Secure URL (https) supported with curl
Options
The first argument of this function is the URL to be fetched.
Second argument is an associative array. This is an optional argument.
The following values are supported in this array.
-
Return_info
-
Possible values-True/false
If this is true, the function will return an associative array rather than just a string. The array will contain 3 elements...
-
Headers
-
An associative array containing all the headers returned by the server.
-
Body
-
A string-the contents of the URL.
-
Info
-
Some information about the fetch. This is the result returned by the 'curl _ getinfo () 'function. supported only with curl.
-
Method
-
Possible values-post/get
Specifies the method to be used.
-
Modified_since
-
If this option is set, the 'if-modified-since 'header will be used.
This will make sure that the URL will be fetched only it was modified.
Examples
The code to fetch the contents of an URL will look like this...
$contents = load('http://example.com/rss.xml');
Simple, no? This will just return the contents of the URL. If you
Need to do more complex stuff, just use the second argument to pass
More options...
$options = array('return_info'=> true,'method'=> 'post');$result = load('http://www.bin-co.com/rss.xml.php?section=2',$options);print_r($result);
The output will be like this...
Array( [headers] => Array ( [Date] => Mon, 18 Jun 2007 13:56:22 GMT [Server] => Apache/2.0.54 (Unix) PHP/4.4.7 mod_ssl/2.0.54 OpenSSL/0.9.7e mod_fastcgi/2.4.2 DAV/2 SVN/1.4.2 [X-Powered-By] => PHP/5.2.2 [Expires] => Thu, 19 Nov 1981 08:52:00 GMT [Cache-Control] => no-store, no-cache, must-revalidate, post-check=0, pre-check=0 [Pragma] => no-cache [Set-Cookie] => PHPSESSID=85g9n1i320ao08kp5tmmneohm1; path=/ [Last-Modified] => Tue, 30 Nov 1999 00:00:00 GMT [Vary] => Accept-Encoding [Transfer-Encoding] => chunked [Content-Type] => text/xml )[body] => ... Contents of the Page ...[info] => Array ( [url] => http://www.bin-co.com/rss.xml.php?section=2 [content_type] => text/xml [http_code] => 200 [header_size] => 501 [request_size] => 146 [filetime] => -1 [ssl_verify_result] => 0 [redirect_count] => 0 [total_time] => 1.113792 [namelookup_time] => 0.180019 [connect_time] => 0.467973 [pretransfer_time] => 0.468035 [size_upload] => 0 [size_download] => 2274 [speed_download] => 2041 [speed_upload] => 0 [download_content_length] => 0 [upload_content_length] => 0 [starttransfer_time] => 0.826031 [redirect_time] => 0 ))
Code
<?php<br />/**<br /> * See http://www.bin-co.com/php/scripts/load/<br /> * Version : 2.00.A<br /> */<br />function load($url,$options=array()) {<br /> $default_options = array(<br /> 'method' => 'get',<br /> 'return_info' => false,<br /> 'return_body' => true,<br /> 'cache' => false,<br /> 'referer' => '',<br /> 'headers' => array(),<br /> 'session' => false,<br /> 'session_close' => false,<br /> );<br /> // Sets the default options.<br /> foreach($default_options as $opt=>$value) {<br /> if(!isset($options[$opt])) $options[$opt] = $value;<br /> }<br /> $url_parts = parse_url($url);<br /> $ch = false;<br /> $info = array(//Currently only supported by curl.<br /> 'http_code' => 200<br /> );<br /> $response = '';</p><p> $send_header = array(<br /> 'Accept' => 'text/*',<br /> 'User-Agent' => 'BinGet/1.00.A (http://www.bin-co.com/php/scripts/load/)'<br /> ) + $options['headers']; // Add custom headers provided by the user.</p><p> if($options['cache']) {<br /> $cache_folder = '/tmp/php-load-function/';<br /> if(isset($options['cache_folder'])) $cache_folder = $options['cache_folder'];<br /> if(!file_exists($cache_folder)) {<br /> $old_umask = umask(0); // Or the folder will not get write permission for everybody.<br /> mkdir($cache_folder, 0777);<br /> umask($old_umask);<br /> }</p><p> $cache_file_name = md5($url) . '.cache';<br /> $cache_file = joinPath($cache_folder, $cache_file_name); //Don't change the variable name - used at the end of the function.</p><p> if(file_exists($cache_file)) { // Cached file exists - return that.<br /> $response = file_get_contents($cache_file);</p><p> //Seperate header and content<br /> $separator_position = strpos($response,"/r/n/r/n");<br /> $header_text = substr($response,0,$separator_position);<br /> $body = substr($response,$separator_position+4);</p><p> foreach(explode("/n",$header_text) as $line) {<br /> $parts = explode(": ",$line);<br /> if(count($parts) == 2) $headers[$parts[0]] = chop($parts[1]);<br /> }<br /> $headers['cached'] = true;</p><p> if(!$options['return_info']) return $body;<br /> else return array('headers' => $headers, 'body' => $body, 'info' => array('cached'=>true));<br /> }<br /> }<br /> ///////////////////////////// Curl /////////////////////////////////////<br /> //If curl is available, use curl to get the data.<br /> if(function_exists("curl_init")<br /> and (!(isset($options['use']) and $options['use'] == 'fsocketopen'))) { //Don't use curl if it is specifically stated to use fsocketopen in the options</p><p> if(isset($options['post_data'])) { //There is an option to specify some data to be posted.<br /> $page = $url;<br /> $options['method'] = 'post';</p><p> if(is_array($options['post_data'])) { //The data is in array format.<br /> $post_data = array();<br /> foreach($options['post_data'] as $key=>$value) {<br /> $post_data[] = "$key=" . urlencode($value);<br /> }<br /> $url_parts['query'] = implode('&', $post_data);</p><p> } else { //Its a string<br /> $url_parts['query'] = $options['post_data'];<br /> }<br /> } else {<br /> if(isset($options['method']) and $options['method'] == 'post') {<br /> $page = $url_parts['scheme'] . '://' . $url_parts['host'] . $url_parts['path'];<br /> } else {<br /> $page = $url;<br /> }<br /> }<br /> if($options['session'] and isset($GLOBALS['_binget_curl_session'])) $ch = $GLOBALS['_binget_curl_session']; //Session is stored in a global variable<br /> else $ch = curl_init($url_parts['host']);</p><p> curl_setopt($ch, CURLOPT_URL, $page) or die("Invalid cURL Handle Resouce");<br /> curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); //Just return the data - not print the whole thing.<br /> curl_setopt($ch, CURLOPT_HEADER, true); //We need the headers<br /> curl_setopt($ch, CURLOPT_NOBODY, !($options['return_body'])); //The content - if true, will not download the contents. There is a ! operation - don't remove it.<br /> if(isset($options['method']) and $options['method'] == 'post' and isset($url_parts['query'])) {<br /> curl_setopt($ch, CURLOPT_POST, true);<br /> curl_setopt($ch, CURLOPT_POSTFIELDS, $url_parts['query']);<br /> }<br /> //Set the headers our spiders sends<br /> curl_setopt($ch, CURLOPT_USERAGENT, $send_header['User-Agent']); //The Name of the UserAgent we will be using ;)<br /> $custom_headers = array("Accept: " . $send_header['Accept'] );<br /> if(isset($options['modified_since']))<br /> array_push($custom_headers,"If-Modified-Since: ".gmdate('D, d M Y H:i:s /G/M/T',strtotime($options['modified_since'])));<br /> curl_setopt($ch, CURLOPT_HTTPHEADER, $custom_headers);<br /> if($options['referer']) curl_setopt($ch, CURLOPT_REFERER, $options['referer']);<br /> curl_setopt($ch, CURLOPT_COOKIEJAR, "/tmp/binget-cookie.txt"); //If ever needed...<br /> curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);<br /> curl_setopt($ch, CURLOPT_MAXREDIRS, 5);<br /> curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);<br /> if(isset($url_parts['user']) and isset($url_parts['pass'])) {<br /> $custom_headers = array("Authorization: Basic ".base64_encode($url_parts['user'].':'.$url_parts['pass']));<br /> curl_setopt($ch, CURLOPT_HTTPHEADER, $custom_headers);<br /> }<br /> $response = curl_exec($ch);<br /> $info = curl_getinfo($ch); //Some information on the fetch</p><p> if($options['session'] and !$options['session_close']) $GLOBALS['_binget_curl_session'] = $ch; //Dont close the curl session. We may need it later - save it to a global variable<br /> else curl_close($ch); //If the session option is not set, close the session.<br /> //////////////////////////////////////////// FSockOpen //////////////////////////////<br /> } else { //If there is no curl, use fsocketopen - but keep in mind that most advanced features will be lost with this approch.<br /> if(isset($url_parts['query'])) {<br /> if(isset($options['method']) and $options['method'] == 'post')<br /> $page = $url_parts['path'];<br /> else<br /> $page = $url_parts['path'] . '?' . $url_parts['query'];<br /> } else {<br /> $page = $url_parts['path'];<br /> }</p><p> if(!isset($url_parts['port'])) $url_parts['port'] = 80;<br /> $fp = fsockopen($url_parts['host'], $url_parts['port'], $errno, $errstr, 30);<br /> if ($fp) {<br /> $out = '';<br /> if(isset($options['method']) and $options['method'] == 'post' and isset($url_parts['query'])) {<br /> $out .= "POST $page HTTP/1.1/r/n";<br /> } else {<br /> $out .= "GET $page HTTP/1.0/r/n"; //HTTP/1.0 is much easier to handle than HTTP/1.1<br /> }<br /> $out .= "Host: $url_parts[host]/r/n";<br /> $out .= "Accept: $send_header[Accept]/r/n";<br /> $out .= "User-Agent: {$send_header['User-Agent']}/r/n";<br /> if(isset($options['modified_since']))<br /> $out .= "If-Modified-Since: ".gmdate('D, d M Y H:i:s /G/M/T',strtotime($options['modified_since'])) ."/r/n";<br /> $out .= "Connection: Close/r/n";</p><p> //HTTP Basic Authorization support<br /> if(isset($url_parts['user']) and isset($url_parts['pass'])) {<br /> $out .= "Authorization: Basic ".base64_encode($url_parts['user'].':'.$url_parts['pass']) . "/r/n";<br /> }<br /> //If the request is post - pass the data in a special way.<br /> if(isset($options['method']) and $options['method'] == 'post' and $url_parts['query']) {<br /> $out .= "Content-Type: application/x-www-form-urlencoded/r/n";<br /> $out .= 'Content-Length: ' . strlen($url_parts['query']) . "/r/n";<br /> $out .= "/r/n" . $url_parts['query'];<br /> }<br /> $out .= "/r/n";<br /> fwrite($fp, $out);<br /> while (!feof($fp)) {<br /> $response .= fgets($fp, 128);<br /> }<br /> fclose($fp);<br /> }<br /> }<br /> //Get the headers in an associative array<br /> $headers = array();<br /> if($info['http_code'] == 404) {<br /> $body = "";<br /> $headers['Status'] = 404;<br /> } else {<br /> //Seperate header and content<br /> $header_text = substr($response, 0, $info['header_size']);<br /> $body = substr($response, $info['header_size']);</p><p> foreach(explode("/n",$header_text) as $line) {<br /> $parts = explode(": ",$line);<br /> if(count($parts) == 2) $headers[$parts[0]] = chop($parts[1]);<br /> }<br /> }</p><p> if(isset($cache_file)) { //Should we cache the URL?<br /> file_put_contents($cache_file, $response);<br /> }<br /> if($options['return_info']) return array('headers' => $headers, 'body' => $body, 'info' => $info, 'curl_handle'=>$ch);<br /> return $body;<br />}<br />?>