How to capture real-time webpage content # URL: data. shishicai. cncqsschaoma # Demo: & lt ;? Php * & nbsp; Created & nbsp; on & nbsp; [2013-5-1] & nbsp; Author [Newton] & nbsp; Filename [action. php] * # code conversion function & nbsp; c How to capture real-time webpage content
# Web: http://data.shishicai.cn/cqssc/haoma/
# Demo:
/* Created on [2013-5-1] Author [Newton] Filename [action. php] */
# Encoding and conversion
Function convToUtf8 ($ str ){
If (mb_detect_encoding ($ str, "UTF-8, ISO-8859-1, GBK ")! = "UTF-8 "){
Return iconv ("GBK", "UTF-8", $ str );
} Else {
Return $ str;
}
}
Header ("content-type: text/html; charset: UTF-8 ");
Error_reporting (E_ERROR );
$ Pages = file_get_contents ('http: // data.shishicai.cn/cqssc/haoma /');
// $ Pages = htmlspecialchars ($ pages );
$ Pages = convToUtf8 ($ pages );
Echo "pages -->". print_r ($ pages); echo PHP_EOL;
$ Doc = new DOMDocument ();
$ New_doc = new DOMDocument ('1. 0', 'utf-8 ');
Echo "doc -->". print_r ($ doc); echo PHP_EOL;
$ Dom = $ doc-> getElementsByTagName ('table ');
$ Newdoc = $ new_doc-> loadhtml ($ dom-> item (2)-> nodeValue );
$ Table = $ new_doc-> saveHTML ();
Echo "table -- >>{ $ table}". PHP_EOL;
# Result:
#...... Garbled ......
# Pages --> 1 DOMDocument Object () doc --> 1 table -->
# Table is empty ......
?>
PHP DOM
Share:
------ Solution --------------------
Page data is filled by JS. You have to crawl the JS script.
------ Solution --------------------
It seems that the frame framework is embedded in the tbody, and then html is made using JS code.
Open http://datacache.shishicai.cn/script/2f67117ba1b58074.js,
Search for 6 results in 'framework '.
Based on my technical analysis, there is no reference to the framework.
LZ seems to be a great player, with a high technical score. looking forward to ING
------ Solution --------------------
Http://data.shishicai.cn/handler/kuaikai/data.ashx
Post: lottery = 4 & date = 2013-05-06
Collect here.