Use Curl to crawl Web data, Phantomjs. Ask God to understand

Source: Internet
Author: User
Younger brother yesterday has issued a consultation, there are a lot of God-man to my little brother very big help, at present only a small piece of data not caught.

There's a big saying with PHANTOMJS to crawl HTML
The current JS as
var page = require (' webpage '). Create ();
var url = ' HTTP://WWW.CBSSPORTS.COM/MLB/GAMETRACKER/LIVE/MLB_20140528_CLE@CHW ';

page.open(url, function (status) {var js = page.evaluate(function () {return document;});console.log(js.all[0].outerHTML);phantom.exit();});

Error, not showing the correct HMTL
Another phontomjs is an execution file, how do I get him to automate every second, with PHP? Because in PHP I can only use it now.
EXEC ("Start D:\phantomjs script.js")
Let it automatically generate this document and then parse it for this documentation, but there is no way to execute it.

2014 05 23 Update

There are several data that have been caught before on the website.
Let's take a look at my program as follows:
$url = "Http://www.cbssports.com/mlb/gametracker/live/mlb_20140529_sf@stl";
$ch = Curl_init ();
curl_setopt ($ch, Curlopt_header, 0);
curl_setopt ($ch, Curlopt_url, $url);
curl_setopt ($ch, Curlopt_returntransfer, 1);
curl_setopt ($ch, Curlopt_useragent, "mozilla/5.0 (Windows NT 6.1; WOW64) applewebkit/537.22 (khtml, like Gecko) chrome/25.0.1364.172 safari/537.22 ");
curl_setopt ($ch, curlopt_followlocation, 1);
curl_setopt ($ch, curlopt_connecttimeout, 0);
$data = curl_exec ($ch);
Preg_match_all ('/(.?) <\/span>/is ', $data, $teamCity);
Preg_match_all ('/(.
?) <\/span>/is ', $data, $teamName), .... The following normalization

The information that is not captured is as follows: (The Scarlet Letter is not to be caught) (the following is only part)

There will be a

Or

<

Div class= "Batter-pitcher fleft" >

<

Table> inside.

The point is that part of the data, no matter what browser you use "Save new"save as "or" View the original code ", do not see the above data. and Div class= "Batter-pitcher fleft" This part of the information is currently known as JS batter_ingame_stats function is run "in the game"

and the other function function () {CBSi.app.BaseRunners = function (args is the "who is on the base" in the lower right corner of the "Golf course icon" data, currently only the few parts can not be caught.

Many man-in-the-god said, "Just grab JS", but, you can not ask how to grasp.

Kneel and beg you to give a direction greatly.

This topic is discussed in: http://segmentfault.com/q/1010000000522277

Current Live events: Http://www.cbssports.com/mlb/gametracker/live/mlb_20140529_sf@stl

Reply content:

Younger brother yesterday has issued a consultation, there are a lot of God-man to my little brother very big help, at present only a small piece of data not caught.

There's a big saying with PHANTOMJS to crawl HTML
The current JS as
var page = require (' webpage '). Create ();
var url = ' HTTP://WWW.CBSSPORTS.COM/MLB/GAMETRACKER/LIVE/MLB_20140528_CLE@CHW ';

page.open(url, function (status) {var js = page.evaluate(function () {return document;});console.log(js.all[0].outerHTML);phantom.exit();});

Error, not showing the correct HMTL
Another phontomjs is an execution file, how do I get him to automate every second, with PHP? Because in PHP I can only use it now.
EXEC ("Start D:\phantomjs script.js")
Let it automatically generate this document and then parse it for this documentation, but there is no way to execute it.

2014 05 23 Update

There are several data that have been caught before on the website.
Let's take a look at my program as follows:
$url = "Http://www.cbssports.com/mlb/gametracker/live/mlb_20140529_sf@stl";
$ch = Curl_init ();
curl_setopt ($ch, Curlopt_header, 0);
curl_setopt ($ch, Curlopt_url, $url);
curl_setopt ($ch, Curlopt_returntransfer, 1);
curl_setopt ($ch, Curlopt_useragent, "mozilla/5.0 (Windows NT 6.1; WOW64) applewebkit/537.22 (khtml, like Gecko) chrome/25.0.1364.172 safari/537.22 ");
curl_setopt ($ch, curlopt_followlocation, 1);
curl_setopt ($ch, curlopt_connecttimeout, 0);
$data = curl_exec ($ch);
Preg_match_all ('/(.?) <\/span>/is ', $data, $teamCity);
Preg_match_all ('/(.
?) <\/span>/is ', $data, $teamName), .... The following normalization

The information that is not captured is as follows: (The Scarlet Letter is not to be caught) (the following is only part)

Can't get the data out.

There will be a

Or

<

Div class= "Batter-pitcher fleft" >

<

Table> inside.

The point is that part of the data, no matter what browser you use "Save new"save as "or" View the original code ", do not see the above data. and Div class= "Batter-pitcher fleft" This part of the information is currently known as JS batter_ingame_stats function is run "in the game"

and the other function function () {CBSi.app.BaseRunners = function (args is the "who is on the base" in the lower right corner of the "Golf course icon" data, currently only the few parts can not be caught.

Many man-in-the-god said, "Just grab JS", but, you can not ask how to grasp.

Kneel and beg you to give a direction greatly.

This topic is discussed in: http://segmentfault.com/q/1010000000522277

Current Live events: Http://www.cbssports.com/mlb/gametracker/live/mlb_20140529_sf@stl

var page = require('webpage').create();page.open('http://segmentfault.com/', function(status) {  var ua = page.evaluate(function() {    return document.body.outerHTML;  });  console.log(ua);  phantom.exit();});
  • So write

    Can't get the data out.
    Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.