Younger brother yesterday has issued a consultation, there are a lot of God-man to my little brother very big help, at present only a small piece of data not caught.
There's a big saying with PHANTOMJS to crawl HTML
The current JS as
var page = require (' webpage '). Create ();
var url = ' HTTP://WWW.CBSSPORTS.COM/MLB/GAMETRACKER/LIVE/MLB_20140528_CLE@CHW ';
page.open(url, function (status) {var js = page.evaluate(function () {return document;});console.log(js.all[0].outerHTML);phantom.exit();});
Error, not showing the correct HMTL
Another phontomjs is an execution file, how do I get him to automate every second, with PHP? Because in PHP I can only use it now.
EXEC ("Start D:\phantomjs script.js")
Let it automatically generate this document and then parse it for this documentation, but there is no way to execute it.
2014 05 23 Update
There are several data that have been caught before on the website.
Let's take a look at my program as follows:
$url = "Http://www.cbssports.com/mlb/gametracker/live/mlb_20140529_sf@stl";
$ch = Curl_init ();
curl_setopt ($ch, Curlopt_header, 0);
curl_setopt ($ch, Curlopt_url, $url);
curl_setopt ($ch, Curlopt_returntransfer, 1);
curl_setopt ($ch, Curlopt_useragent, "mozilla/5.0 (Windows NT 6.1; WOW64) applewebkit/537.22 (khtml, like Gecko) chrome/25.0.1364.172 safari/537.22 ");
curl_setopt ($ch, curlopt_followlocation, 1);
curl_setopt ($ch, curlopt_connecttimeout, 0);
$data = curl_exec ($ch);
Preg_match_all ('/(.?) <\/span>/is ', $data, $teamCity);
Preg_match_all ('/(.?) <\/span>/is ', $data, $teamName), .... The following normalization
The information that is not captured is as follows: (The Scarlet Letter is not to be caught) (the following is only part)
There will be a
Or
<
Div class= "Batter-pitcher fleft" >
<
Table> inside.
The point is that part of the data, no matter what browser you use "Save new"save as "or" View the original code ", do not see the above data. and Div class= "Batter-pitcher fleft" This part of the information is currently known as JS batter_ingame_stats function is run "in the game"
and the other function function () {CBSi.app.BaseRunners = function (args is the "who is on the base" in the lower right corner of the "Golf course icon" data, currently only the few parts can not be caught.
Many man-in-the-god said, "Just grab JS", but, you can not ask how to grasp.
Kneel and beg you to give a direction greatly.
This topic is discussed in: http://segmentfault.com/q/1010000000522277
Current Live events: Http://www.cbssports.com/mlb/gametracker/live/mlb_20140529_sf@stl
Reply content:
Younger brother yesterday has issued a consultation, there are a lot of God-man to my little brother very big help, at present only a small piece of data not caught.
There's a big saying with PHANTOMJS to crawl HTML
The current JS as
var page = require (' webpage '). Create ();
var url = ' HTTP://WWW.CBSSPORTS.COM/MLB/GAMETRACKER/LIVE/MLB_20140528_CLE@CHW ';
page.open(url, function (status) {var js = page.evaluate(function () {return document;});console.log(js.all[0].outerHTML);phantom.exit();});
Error, not showing the correct HMTL
Another phontomjs is an execution file, how do I get him to automate every second, with PHP? Because in PHP I can only use it now.
EXEC ("Start D:\phantomjs script.js")
Let it automatically generate this document and then parse it for this documentation, but there is no way to execute it.
2014 05 23 Update
There are several data that have been caught before on the website.
Let's take a look at my program as follows:
$url = "Http://www.cbssports.com/mlb/gametracker/live/mlb_20140529_sf@stl";
$ch = Curl_init ();
curl_setopt ($ch, Curlopt_header, 0);
curl_setopt ($ch, Curlopt_url, $url);
curl_setopt ($ch, Curlopt_returntransfer, 1);
curl_setopt ($ch, Curlopt_useragent, "mozilla/5.0 (Windows NT 6.1; WOW64) applewebkit/537.22 (khtml, like Gecko) chrome/25.0.1364.172 safari/537.22 ");
curl_setopt ($ch, curlopt_followlocation, 1);
curl_setopt ($ch, curlopt_connecttimeout, 0);
$data = curl_exec ($ch);
Preg_match_all ('/(.?) <\/span>/is ', $data, $teamCity);
Preg_match_all ('/(.?) <\/span>/is ', $data, $teamName), .... The following normalization
The information that is not captured is as follows: (The Scarlet Letter is not to be caught) (the following is only part)
There will be a
Or
<
Div class= "Batter-pitcher fleft" >
<
Table> inside.
The point is that part of the data, no matter what browser you use "Save new"save as "or" View the original code ", do not see the above data. and Div class= "Batter-pitcher fleft" This part of the information is currently known as JS batter_ingame_stats function is run "in the game"
and the other function function () {CBSi.app.BaseRunners = function (args is the "who is on the base" in the lower right corner of the "Golf course icon" data, currently only the few parts can not be caught.
Many man-in-the-god said, "Just grab JS", but, you can not ask how to grasp.
Kneel and beg you to give a direction greatly.
This topic is discussed in: http://segmentfault.com/q/1010000000522277
Current Live events: Http://www.cbssports.com/mlb/gametracker/live/mlb_20140529_sf@stl
var page = require('webpage').create();page.open('http://segmentfault.com/', function(status) { var ua = page.evaluate(function() { return document.body.outerHTML; }); console.log(ua); phantom.exit();});
So write