Data Solution for curl crawling educational administration system

Source: Internet
Author: User
Curl crawls the data of the educational administration system. you have come to consult with us again. I have encountered this situation again. The code is like this. & lt ;? Phpheader & nbsp; (& nbsp; "content-Type: & nbsp; text/html; & nbsp; charset = utf-8" & nbsp;); & nbsp; re curl crawls educational administration system data
Hello, I have come to consult you again.
The code is like this


Header ("content-Type: text/html; charset = utf-8 ");
Require_once 'search. php ';
// Step 1: submit data, generate a cookie, and save the cookie in the temporary directory
$ Cookiejar = realpath ('cookie.txt ');
$ Id = $ _ GET ['id'];
$ Password = $ _ GET ['password'];
$ Year = $ _ GET ['Year'];
$ Term = $ _ GET ['term '];
$ Ch = curl_init ();
$ Login_url = "http: // 211.67.32.51/default3.aspx ";
$ CurlPost = "_ VIEWSTATE = signature % 2BO2w8bzxmPjs % 2BPjs7Pjs % signature % 2BOz4% 2 BOzs % 2BOz4% signature % 2 FCbCuTw % 3D & tbYHM = k061138526 & tbPSW = 100311 & ddlSF = students & imgDL. x = 40 & imgDL. y = 7 ";
$ CurlPost = iconv ("UTF-8", "GBK", $ curlPost );
Curl_setopt ($ ch, CURLOPT_URL, $ login_url );
// When enabled, the header file information is output as a data stream.
Curl_setopt ($ ch, CURLOPT_PROXY, 'jackdowosn .gnway.net: 81 ');
Curl_setopt ($ ch, CURLOPT_HEADER, 0 );
Curl_setopt ($ ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv: 1.8.1.1) Gecko/20061204 Firefox/4 ");
Curl_setopt ($ ch, CURLOPT_FOLLOWLOCATION, true );
Curl_setopt ($ ch, CURLOPT_RETURNTRANSFER, 1 );
Curl_setopt ($ ch, CURLOPT_REFERER, 'http: // 211.67.32.51 /');
Curl_setopt ($ ch, CURLOPT_POST, 1 );
Curl_setopt ($ ch, CURLOPT_POSTFIELDS, $ curlPost );
// Sets the file for storing cookie information after the connection ends.
Curl_setopt ($ ch, CURLOPT_COOKIEJAR, $ cookiejar );
$ Data = curl_exec ($ ch );
// $ Data = mb_convert_encoding ($ data, "UTF-8", "GBK ");
// Echo' '. $ Data .' ';
$ CurlPost = "xh = k061110826 ";
$ CurlPost = iconv ("UTF-8", "GBK", $ curlPost );
Curl_setopt ($ ch, CURLOPT_URL, "http: // 211.67.32.51/xscj. aspx? Xh = k061138526 ");
// When enabled, the header file information is output as a data stream.
Curl_setopt ($ ch, CURLOPT_PROXY, 'jackdowosn .gnway.net: 81 ');
Curl_setopt ($ ch, CURLOPT_HEADER, 0 );
Curl_setopt ($ ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv: 1.8.1.1) Gecko/20061204 Firefox/4 ");
Curl_setopt ($ ch, CURLOPT_FOLLOWLOCATION, true );
Curl_setopt ($ ch, CURLOPT_RETURNTRANSFER, 1 );
Curl_setopt ($ ch, CURLOPT_REFERER, 'http: // 211.67.32.51 /');
Curl_setopt ($ ch, CURLOPT_POST, 0 );
Curl_setopt ($ ch, CURLOPT_POSTFIELDS, $ curlPost );
// Sets the file for storing cookie information after the connection ends.
Curl_setopt ($ ch, CURLOPT_COOKIEFILE, $ cookiejar );
$ Data = curl_exec ($ ch );
$ Data = mb_convert_encoding ($ data, "UTF-8", "GBK ");
Preg_match_all ('/\ /I ', $ data, $ matches );
// The above pattern modifier cannot be added with s
// File_put_contents ("d: // value.txt", $ matches [1] [0]);
// Echo var_dump ($ matches [1] [0])."





";
// Echo $ matches [1] [0];
// Echo' '. $ Data .' ';
Echo search3 ($ id, $ year, $ term, $ ch, $ matches [1] [0]);
?>


Function search3 ($ id, $ year, $ term, $ ch, $ value ){
// $ Data = file_get_contents ("d: // value.txt ");
Curl_setopt ($ ch, CURLOPT_PROXY, 'jackdowosn .gnway.net: 81 ');
$ CurlPost = "xh = k061138526 & __ VIEWSTATE = $ value & Button2 = query by school year term & ddlKCLX = required & xn = 2012-2013 & xq = 1 ";
$ CurlPost = iconv ("UTF-8", "GBK", $ curlPost );
Curl_setopt ($ ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv: 1.8.1.1) Gecko/20061204 Firefox/4 ");
Curl_setopt ($ ch, CURLOPT_FOLLOWLOCATION, true );
Curl_setopt ($ ch, CURLOPT_URL, "http: // 211.67.32.51/xscj. aspx ");
Curl_setopt ($ ch, CURLOPT_HEADER, 0 );
Curl_setopt ($ ch, CURLOPT_RETURNTRANSFER, 1 );
Curl_setopt ($ ch, CURLOPT_POST, 1 );
Curl_setopt ($ ch, CURLOPT_REFERER, "http: // 211.67.32.51/xscj. aspx? Xh = k061138526 ");
Curl_setopt ($ ch, CURLOPT_POSTFIELDS, $ curlPost );
Curl_setopt ($ ch, CURLOPT_COOKIEFILE, $ cookiejar); // return the cookie
$ Data = curl_exec ($ ch );
Curl_close ($ ch );
$ Data = mb_convert_encoding ($ data, "UTF-8", "GBK ");
/* Preg_match_all ('/\ \ S *\ (.*?) \ <\/Td \> \ s *\ (.*?) \ <\/Td \>/is ', $ data, $ matches );
Foreach ($ matches [1] as $ key => $ val)
$ Nav = $ nav. "\ n". $ val. "---". $ matches [2] [$ key]; */
Return $ data;
}


When the above program is executed to search3, other programs can return data normally. When I asked a senior, his answer was "I don't know. I have the impression that we have encountered this problem in the square system. it may be that the parameter data transmission is wrong, the encoding is wrong, or the Referer parameter is not set ". Please help me to see where the problem is. if you are interested, you can debug it for me. the proxy servers are actually available. Below are several post parameters and header information
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.