PHP get_meta_tags (), Curl and user-agent usage analysis _php techniques

Source: Internet
Author: User

This article analyzes the usage of get_meta_tags (), Curl and user-agent in PHP. Share to everyone for your reference. The specific analysis is as follows:

The Get_meta_tags () function is used to crawl the <meta name= "a" content= "1" ><meta name= "B" content= "2" > in a Web page, and to load a one-dimensional array, name as the element subscript, The content is an element value, and the tag in the example above can get an array of: Array (' A ' => ' 1 ', ' B ' => ' 2 '), other <meta> labels are not processed, and the function is only processed to the

User-agent is a part of the invisible header information submitted by the browser when requesting a Web page from the server, and the header information is an array containing multiple information, such as a local cache directory, cookies, etc., where the user-agent is a browser type declaration, IE, Chrome, FF, etc. .

Today in the capture of a Web page <meta> tags, always get null value, but the direct view of the page source code is normal, and then questioned whether the server set up according to the header information to determine the output, first try to use Get_meta_tags () to crawl a local file, The local file then writes the header information to the file, and the result is as follows, which is replaced by/, easy to view, and the code is as follows:

Copy Code code as follows:
Array (
' Http_host ' => ' 192.168.30.205 ',
' PATH ' => ' C:/Program Files/common Files/netsarang; C:/Program Files/nvidia Corporation/physx/common; C:/Program Files/common files/microsoft shared/windows Live; C:/Program Files/intel/icls client/; C:/windows/system32; C:/windows; C:/windows/system32/wbem; c:/windows/system32/windowspowershell/v1.0/; C:/Program Files/intel/intel (R) Management Engine components/dal; C:/Program Files/intel/intel (R) Management Engine components/ipt; C:/Program Files/intel/opencl sdk/2.0/bin/x86; C:/Program Files/common Files/thunder network/kankan/codecs; C:/Program Files/quicktime Alternative/qtsystem; C:/Program Files/windows live/shared; C:/Program Files/quicktime alternative/qtsystem/; %java_home%/bin;%java_home%/jre/bin; ',
' SystemRoot ' => ' c:/windows ',
' COMSPEC ' => ' C:/windows/system32/cmd.exe ',
' Pathext ' => '. COM;. EXE;. BAT;. CMD;. VBS;. VBE;. JS;. JSE;. WSF;. WSH;. MSC ',
' windir ' => ' c:/windows ',
' Server_signature ' => ',
' Server_software ' => ' apache/2.2.11 (Win32) php/5.2.8 ',
' server_name ' => ' 192.168.30.205 ',
' Server_addr ' => ' 192.168.30.205 ',
' Server_port ' => ' 80 ',
' Remote_addr ' => ' 192.168.30.205 ',
' Document_root ' => ' e:/wamp/www ',
' Server_admin ' => ' admin@admin.com ',
' Script_filename ' => ' e:/wamp/www/user-agent.php ',
' Remote_port ' => ' 59479 ',
' Gateway_interface ' => ' cgi/1.1 ',
' Server_protocol ' => ' http/1.0 ',
' Request_method ' => ' get ',
' Query_string ' => ',
' Request_uri ' => '/user-agent.php ',
' Script_name ' => '/user-agent.php ',
' Php_self ' => '/user-agent.php ',
' Request_time ' => 1400747529,
)

Sure enough, there is no http_user_agent this element in the array, Apache when sending a request to another server is not UA, after checking the data, get_meta_tags () function does not have the ability to forge UA, so can only use other methods to solve.

Later use curl to get, get to the Web page, but use a little trouble, first Forge UA, obtain after using regular expression analysis of <meta>

Forgery method, the code is as follows:

Copy Code code as follows:
Initialize a CURL
$curl = Curl_init ();

Set the URL you want to crawl
curl_setopt ($curl, Curlopt_url, ' http://localhost/user-agent.php ');

Sets whether to output file headers to the browser, 0 does not output
curl_setopt ($curl, Curlopt_header, 0);

Set UA, where the browser's UA is forwarded to the server, or the value can be specified manually
curl_setopt ($curl, curlopt_useragent, $_server[' http_user_agent ']);

Sets the curl parameter to require the result to be returned to the string or output to the screen. 0 output screen and returns the bool value of the operation result, 1 returns the string
curl_setopt ($curl, Curlopt_returntransfer, 1);
Run Curl, request Web page
$data = curl_exec ($curl);

Close URL Request
Curl_close ($curl);

Processing the obtained data
Var_dump ($data);

I hope this article will help you with your PHP program design.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.