Phpcurl simulates the post request to submit data. Recently, I was working on the library information collection program in the campus library. since it is a collection of library books, there must be a page for submitting search. it is nothing more than post submission, which reminds me of the curl analog submission, recently, I was working on the library information collection program in the campus library. since it is a collection of library books, there must be a page for submitting search. it is nothing more than post submission, which reminds me of the curl analog submission, first, you can use firebug to capture packets and query the format after the post is submitted as follows:
txtWxlx=CN&hidWxlx=spanCNLx&txtPY=HZ&txtTm=%D2%F4%C0%D6&txtLx=%25&txtSearchType=1&nMaxCount=100&nSetPageSize=10&cSortFld=%D5%FD%CC%E2%C3%FB&B1=%BC%EC+%CB%F7;The search keyword name = txtTm. the code is as follows:
However, the returned page always shows no relevant content. if you change the keyword to English or a number, the returned page can be displayed normally. Therefore, this should be a coding problem, then we can see on the post that txtTm = % D2 % F4 % C0 % D6 is url encoded and Chinese characters are converted. if it is English, no changes will be made, so I added some content to the header as follows:
$header = array();$header[] = 'User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:30.0) Gecko/20100101 Firefox/30.0';$header[] = 'Accept:text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8';$header[] = 'Connection: keep-alive';$header[] = 'Content-Type:application/x-www-form-urlencoded';curl_setopt ( $ch, CURLOPT_HTTPHEADER, $header );
In fact, the main reason is $ header [] = 'content-Type: application/x-www-form-urlencoded'. after loading the page again, if it is a Chinese character, there is still no relevant Content, then I thought of a very simple problem. The php program is utf8, but the library's website is gb2312. well, add another sentence, $ keyword = iconv ('utf-8 ', 'gb2312', $ keyword); reload successfully. This should be the key to the problem. then I will delete the header information and add $ keyword = urlencode ($ keyword ); load again, that is, the following code:
As expected, the page is displayed normally, followed by the collection and layout of the page content. It's nothing more than regular expressions. (Because the campus network can only be accessed through the intranet)
Success ,...