Usually in development often encounter crawl a page content, but sometimes some pages need to login to access, the most common is the forum, this time we need to use Curl simulation landing. General idea: You need to request to extract the cookies and save, and then use the saved cookies to send the request again to get the page content, below we directly on the code
' Pythontab ', ' password ' = ' pythontab ',//curl initialize $ch = Curl_init (), curl_setopt ($ch, Curlopt_url, $url);//set to post please curl_setopt ($ch, Curlopt_post, true);//set with return HEADER information is empty curl_setopt ($ch, Curlopt_header, 0);//post data curl_setopt ($ CH, curlopt_postfields, $data);//cookie Save File Location curl_setopt ($ch, Curlopt_cookiejar, $cookieFile);//Set data return as a variable to store, Instead of the direct output curl_setopt ($ch, Curlopt_returntransfer, true);//execute Request $ret = curl_exec ($ch);//close Connection curl_close ($ch);// Step two: The page with a cookie request needs to be logged in with the login = ' http://www.pythontab.com ';//curl initialization $ch = Curl_init (); curl_setopt ($ch, Curlopt_url, $url );//set to POST request curl_setopt ($ch, Curlopt_post, true);//set with return HEADER information is empty curl_setopt ($ch, Curlopt_header, 0);// Set the cookie information file location, note that unlike in the second step, this is read curl_setopt ($ch, Curlopt_cookiefile, $cookieFile);//Set data return as a variable store, rather than direct output curl_ Setopt ($ch, Curlopt_returntransfer, true);//execute Request $ret = curl_exec ($ch);//close Connection curl_close ($ch);//Print crawl content var_dump ($ret) ;
This allows us to crawl the content that needs to be logged in to access the page, note that the address above is just an example and needs to be replaced with the address you want to crawl the page. So we can do a lot of things, do not do bad things oh!