Today in the company encountered a bug, that has been used to download the MP3 audio file is empty, but the browser to get the request is a file, and the size is not 0kb, but I use curl download down is 0K, think no solution. Finally Kung Fu, got the method, the original I have to go to the third-party interface to get the recording data, but today's recording data address jumps, that is, the first time the address of the request returned is 302,
This is the previous code
$ch = curl_init (); curl_setopt ($ch, Curlopt_url, $url);//curl_setopt ($ch, Curlinfo_header_out, TRUE); curl_setopt ($ch, Curlopt_returntransfer, TRUE), curl_setopt ($ch, Curlopt_ssl_verifypeer, FALSE); curl_setopt ($ch, Curlopt_ssl_verifyhost, FALSE); $info = Curl_exec ($ch);
In other words, curl in the first request, the server returned 302, in fact, is to jump, but curl is not the default jump, so $info has been empty
After improvement
$ch = curl_init (); curl_setopt ($ch, Curlopt_url, $url);//curl_setopt ($ch, Curlinfo_header_out, TRUE); curl_setopt ($ch, Curlopt_returntransfer, TRUE); curl_setopt ($ch, curlopt_followlocation, 1); curl_setopt ($ch, Curlopt_ssl_verifypeer, False); curl_setopt ($ch, Curlopt_ssl_verifyhost, FALSE); $info = curl_exec ($ch) ;
This time is more curlopt_followlocation, said to allow curl to jump. There's $info data!
PS: About data
curl_setopt ( $ch , Curlopt_maxredirs,20 curl_setopt ( $ch , Curlopt_followlocation,1 Curlopt_followlocation means automatic jump fetching, Curlopt_maxredirs indicates the maximum number of jumps allowed.
However, it is important to note that the curlopt_followlocation needs to be used when the Open_basedir is not set in safe mode off. Open_basedir is a setting in php.ini that restricts user-actionable files to a directory.
If you open a safe mode, or if you set the open_basedir, you cannot use automatic jump fetching, you can grab the final page with a continuous crawl method. To speed up and reduce unnecessary overhead, you can use
in the crawl of non-target pages in the middle of the process |
curl_setopt($rch, curlopt_header, TRUE); curl_setopt($rch, curlopt_nobody, TRUE);
Only grab the header information, do not crawl the page content, the header information status code (301,302) to judge. If you want to jump, get the address from location to jump, crawl again until the status code is 200 state. Finally, the target page is crawled.
About Curl Jump crawl