What is PHANTOMJS
PHANTOMJS The official website is said, ' Whole station test, screen capture, auto-paging, network monitoring ', is currently more popular to crawl complex, difficult to pass through the API or regular matching pages, such as the page is loaded asynchronously. PHANTOMJS is a full browser with no interface, so we can use it to simulate a real browser to access the page and then get the page. My point is to call Phantomjs in node to get the page.
Node communicates with PHANTOMJS
- Command-line arguments can only be passed on when the PHANTOMJS is turned on, and there is nothing to do during the run.
- The standard output can output data from PHANTOMJS to node, but it cannot be reversed.
- HTTP PHANTOMJS sends an HTTP request to node, and then node returns data, but the request can only be sent by PHANTOMJS
- WebSocket websocket Communication, can be two-way communication but the implementation of a little trouble.
- Phantomjs-node is actually using WebSocket or HTTP communication, but after all, it is written by others we directly use the line, the disadvantage is a little dependent.
How to use Phantom-node
GitHub Address: Https://github.com/amir20/phantomjs-node
Here are just a few simple instructions for the detailed API see GitHub.
1. Installation
NPM Install Phantom
2. Module encapsulation (The following code is based on ES7, need to support Async/await,node version >7.0), more detailed use can view PHANTOMJS official documents
1' Use strict '2 3CONST PHANTOM = require (' Phantom ');4 5Let Getpic = Async (name) = = {6 //URL Path7Let URL = ' http:///' +name;8 //Create an instance9Const Instance =await Phantom.create ();Ten //Create a page OneConst PAGE =await Instance.createpage (); A //Setting page Parameters -Await Page.property (' viewportsize ', {width:1800, height:1200 } ); - //open URL, return status (URL has transcoding, solve Chinese problem) theConst STATUS =await Page.open (encodeURI (URL)); - console.log (status); - //delay wait page JS execution completed (Phantomjs just wait for all the resources on the page to load, not including the page JS execution time, so need to delay a period of time to wait for JS) -Await Latetime (500 ); + //output page to current directory -Await Page.render (' ${name}--${date.now ()}.png '); + //Destroying Instances A await Instance.exit (); at //Return Data - return' xxx '; - }; - -Let Latetime = (time) = ={ - return NewPromise (function(resolve,reject) { inSetTimeout (function(){ - resolve (); to }, time); + } ); - } the //Exposed Interface *Module.exports = Getpic;
Node calls Phantomjs-node to crawl complex pages