PHP uses curl to implement the demo login and fetch data function sample PHP tips

Source: Internet
Author: User
This article mainly introduces PHP using curl to achieve the simulation login and crawl data functions, combined with examples of PHP using Curl for landing, verification, cookie manipulation and data scraping and other related implementation skills, the need for friends can refer to the next





The example in this paper is about PHP using curl to realize the function of simulating login and fetching data. Share to everyone for your reference, as follows:



Using the php Curl Extension library, you can simulate login and crawl some data that needs to be logged in by the user's account. The implementation process is as follows (personal summary):



1. First, you need to analyze the HTML source code of the corresponding login page to obtain some necessary information:



(1) The address of the login page;



(2) The address of the verification code;



(3) The name and submission method of each field that the login form needs to submit;



(4) The address of the submission of the registration form;



(5) Also need to know the address of the data to be crawled.



2. Obtain cookies and store them (for websites that use cookie files):







$login _url = ' http://www.xxxxx ';  Login page Address $cookie_file = dirname (__file__). " /pic.cookie ";  Cookie file storage location (custom) $ch = Curl_init (); curl_setopt ($ch, Curlopt_url, $login _url); curl_setopt ($ch, Curlopt_header, 0); curl_setopt ($ch, curlopt_returntransfer,1); curl_setopt ($ch, Curlopt_cookiejar, $cookie _file); curl_exec ($ch); curl_ Close ($ch);





3. Obtain the Verification code and store it (for websites that use the CAPTCHA):







$verify _url = "http://www.xxxx";   Captcha Address $ch = Curl_init (); curl_setopt ($ch, Curlopt_url, $verify _url); curl_setopt ($ch, Curlopt_cookiefile, $cookie _ file); curl_setopt ($ch, Curlopt_header, 0); curl_setopt ($ch, Curlopt_returntransfer, 1); $verify _img = curl_exec ($ch); Curl_close ($ch); $fp = fopen ("./verify/verifycode.png", ' W ');  Writes the captured picture file to the local picture file to save the fwrite ($fp, $verify _img); fclose ($FP);





Description



Because the verification code can not be recognized, so I do here is to take the verification code image to the local file, and then in their own project HTML page display, let the user to fill out, and so the user completed the account, password and verification code, and click the Submit button before going to the next step.



4. Simulate the Submit login form:







$ post_url = ' http://www.xxxx ';   Login form submission Address $post = "Username= $account &password= $password &seccodeverify= $verifyCode";// The data submitted by the form (as determined by the form field name and user input) $ch = Curl_init (); curl_setopt ($ch, Curlopt_url, $ post_url); curl_setopt ($ch, Curlopt_header, FALSE); curl_setopt ($ch, curlopt_returntransfer,1); curl_setopt ($ch, Curlopt_postfields, $post);     Submitted in the form of postcurl_setopt ($ch, Curlopt_cookiefile, $cookie _file), curl_exec ($ch); Curl_close ($ch);





5. Crawl data:







$data _url = "http://www.xxxx";   Data address $ch = Curl_init (); curl_setopt ($ch, Curlopt_url, $data _url); curl_setopt ($ch, Curlopt_header, false); Curl_ Setopt ($ch, Curlopt_header, 0); curl_setopt ($ch, curlopt_returntransfer,0); curl_setopt ($ch, Curlopt_cookiefile, $ Cookie_file); $data = Curl_exec ($ch); Curl_close ($ch);





So far, this page of the address where the data is located is captured and stored in the string variable $data.



It is important to note that crawling down is the HTML source of a Web page, meaning that the string contains not only the data you want, but also a lot of HTML tags and other things you don't want. So if you want to extract the data you need from it, you have to analyze the HTML code of the page that holds the data, and then combine the string manipulation functions, regular matching, and so on to extract the data you want from it.



The above methods are valid for general Web sites that use the HTTP protocol. But if you want to impersonate a website that uses the HTTPS protocol, you need to add the following processing:



1. Skip HTTPS authentication:







curl_setopt ($curl, Curlopt_ssl_verifypeer, false); curl_setopt ($curl, Curlopt_ssl_verifyhost, false);





2. Use the user agent:







$UserAgent = ' mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; SLCC1;. NET CLR 2.0.50727;. NET CLR 3.0.04506;. NET CLR 3.5.21022;. NET CLR 1.0.3705;. NET CLR 1.1.4322) '; curl_setopt ($cur L, Curlopt_useragent, $UserAgent);





Note: If you do not add these processes, you cannot successfully impersonate a login.



Using the above program to simulate the login site is generally successful, but in fact, it is necessary for the simulation login site specific considerations. For example: Some Web site coding is different, so you crawl down the page is garbled, then you need to do a bit of code conversion, such as:$data = iconv("gb2312", "utf-8",$data);, the GBK code conversion to UTF8 encoding. There are some high security requirements of the site, such as online banking, will be the verification code in an inline frame, then you need to crawl into the inline frame of the page and then extract the address of the verification code, and then to crawl the verification codes. There are some websites (such as net silver) in the JS code to submit the form, before submitting the form will do some processing, such as encryption, etc., so if you are directly submitted can not log on successfully, you have to do similar processing and then submit, but this situation if you can know the JS code in the specific operation, such as encryption, the encryption algorithm is what, you can do the same as the processing, and then to submit data, so it can be successful. But the key point is that if you don't know what it's doing, like it's encrypted, but you don't know the exact algorithm for encryption, then you can't do the same thing, and you won't be able to successfully impersonate the login. The typical case of this is the net silver, which uses the net silver control to do some processing to the user's password and verification code before submitting the form in the JS code, but we have no idea what it is doing, so we can't simulate it. So if you think you read this article after you can impersonate the network silver, then you are too naïve, the bank of the site can be so easy to be your simulation login? Of course, if you can crack the net-silver control, that's another story. Then again, why I feel so deep, because I met this problem, do not say, said more are tears ah ...






Articles you may be interested in:



Search engine automatically ingest PHP rewrite program PHP instance



PHP and Ethereum client interaction PHP Instance



php Curl Gets the public number Access_token instance of PHP instance


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.