This article is translated from this article. Page View
Browsing the page with Casperjs is more convenient and intuitive than using PHANTOMJS.
For example, successively opens webpage A, then webpage B
In the words of CASPERJS, you can write this:
Casper.start (' URL of website A ', function () {
console.log (' started ');
});
Casper.thenopen (' URL of website B ', function () {
console.log (' started ');
});
Casper.then (function () {
this.evaluate (function () {
//your code here
})
})
Casper.run ();
So, with CASPERJS can be very simple to open a site, then B site.
The key is that after you write, site B does not open until site A is fully loaded.
This is a good feature and is not implemented in PHANTOMJS.
If you use PHANTOMJS, you must handle the event by listening for content loading.
We can look at how to accomplish the same function with PHANTOMJS:
var steps=[];
var testindex = 0;
var loadinprogress = False;//this is set to True when a page is still loading var clientrequests = new Drequest ();
Console.log ("initialization successful"); steps=[function () {//step 1-load Code epicenter console.log ("Request a website and wait for website to Load
");
Clientrequests.sendrequest ("http://photo-epicenter.com"); The function () {//step 2-after page load, parse results. Do not called readresponse () in step one, because we result might be empty console.log ("Website loaded, read respon
Se ");
Clientrequests.readresponse ();
var fs = require ("FS");
Console.log ("Write data to File");
Fs.write ("website.html", Clientrequests.getresponse (), "w");
}
];
Start interval to read website content interval = setinterval (executerequestsstepbystep,2000);
function Executerequestsstepbystep () {if (loadinprogress = = False && typeof Steps[testindex] = = "function") { ConsolE.log ("Step" + (Testindex + 1));
Steps[testindex] ();
testindex++;
} if (typeof Steps[testindex]!= "function") {Console.log ("Test complete!");
Phantom.exit (); }/** * These listeners are very important into order to phantom work.
Using these listeners, we control loadinprogress marker which controls, weather a page is fully loaded.
* Without this, we'll get content of the page, even a page are not fully loaded.
*/clientRequests.phantomPage.onLoadStarted = function () {loadinprogress = true;
Console.log (loadinprogress);
Console.log ("Loading started");
};
clientRequests.phantomPage.onLoadFinished = function () {loadinprogress = false;
Console.log ("Loading finished");
};
ClientRequests.phantomPage.onConsoleMessage = function (msg) {Console.log (msg);
This will require more code, and it should be noted that the code was written by the previous author.
There may be a better plan, but believe me, this code is almost excellent.
If you want to do more, not just plain from page A to page B, the code will become more complex. and make the maintenance of the code a nightmare. Cookies
All two libraries attach a cookie that has been received when it is sent to a subsequent request.
This is useful for crawling pages after logging in. Code maintenance (Codes maintenance)
Obviously, CASPERJS using more intuitive syntax will make it easier for you to maintain your scripts. At the same time, Casperjs has many useful features, such as function Thenclick, which use XPath as the first parameter, which is useful when you want to click on a menu item. With chrome You can take the XPath value of the element and just copy it to the Casperjs script. The Casperjs script simulates the Click event and you are redirected to the desired page. If you need to modify the script to work with other sites, you only need to modify the xpaths.
There are very handy functions in casperjs that are not implemented in PHANTOMJS, refer to these API file Download (file download)
When we discuss PHANTOMJS, this is a hot topic, there are only a few 20 articles discussing how to use Phantomjs to download files, there are two possible ways:
* Invoke Ajax request downloads in your evaluate function, then encode your files, and then return the content to the Phantomjs script.
* You can use PHANTOMJS code that is not compiled on Githut.
Neither of these methods can 100% guarantee that the file will be downloaded properly, and there is a problem that when you do not want to save the downloaded file file system (for example, you are not allowed to save downloaded data to your computer), it is almost impossible to do so through PHANTOMJS. My suggestion is that you do not use Phantomjs to download files.
With Casperjs, file downloads are very easy because CASPERJS provides downloaded functions to download files to the file system, or to file the files that you want to work on and do not want to keep Base64 encoding.
The following is an example of Base64 encoding:
Casper.start (' http://www.google.fr/', function () {
Base64logo = This.base64encode (' Http://www.google.fr/images /srpr/logo3w.png ');
Casper.run (function () {
This.echo (Base64logo). exit ();
});
The following example shows how to download a CSV file via Casperjs without saving the file to the file system:
var casper = require (' Casper '). Create ();
Casper.start ("http://captaincoffee.com.au/dump/", function () {
This.echo (This.gettitle ())
});
Casper.then (function () {
var url = ' Http://captaincoffee.com.au/dump/csv.csv ';
Require (' utils '). Dump (This.base64encode (URL, ' get ')
); Casper.run ();
Summary
Phantomjs and Casperjs can do good work, but if you can choose one of the two, I suggest you use CASPERJS, because you will get better at least the same good results without too much effort.