This article mainly introduced the Nodejs through the phantomjs to download the webpage the method, has the need the small partner to be possible to refer to under.
Functions in fact very simple, through the Phantomjs.exe collection URL loaded resources, through the way of the child process, start Nodejs load all the resources, for CSS resources, matching CSS content, download the URL resources inside
Of course, the function is very simple, in response to design and asynchronous loading, there are still many resources have not been able to download, the need to deal with the actual situation under
First of all, of course, download Nodejs and PHANTOMJS
The following is the down.js of phantomjs.exe execution
?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 The |
|
Here is the corresponding node-run downhtml.js
?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26-27--28 29---30 31--32 33 34 35 36 37 38-39 40 41 42 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80-81 |
"Use strict"; var fs = require (' FS '); var http = require (' http '); var path = require (' path '); var R_url = require (' URL '); var Dircache = {};//Cache reduction judgment function MakeDir (PATHSTR, callback) {if (dircache[pathstr] = = 1) {callback ();} else {fs.exists (pathstr, function (exists) {if (exists = = True) {Dircache[pathstr] = = 1; callback ();} else {MakeDir (Path.d Irname (PATHSTR), function () {Fs.mkdir (pathstr, function () {DIRCACHE[PATHSTR] = 1; callback ();})}); } }) } }; var reg =/[:,]s*url ([']]? *? (1)) /g var reg2 =/([' "]?) (.*?) (1)) /var isdownmap = {}; var downimgfromcss = function (URL) {http.get (URL, function (res) {//console.log (Path.resolve (PROCESS.CWD (), ' Index.min . css '))//res.pipe (Fs.createwritestream (Path.resolve (PROCESS.CWD (), ' index.min.css ')); var body = ""; Res.setencoding (' UTF8 '); Res.on (' Data ', function (chunk) {body + = chunk;}); Res.on (' End ', function () {var match = Body.match (reg); for (var i = 0, len = match.length; i < Len; i++) {var m = MatCh[i].match (REG2); if (M && m[2]) {var url = m[2]; let Imgurl = R_url.resolve (URL, url); if (!isdownmap[imgurl]) {var uo = R_URL.PA RSE (Imgurl); Let filepath = CWD + '/' + uo.hostname + uo.pathname; MakeDir (Path.dirname (filepath), function () {Http.get (Imgurl, function (res) {Res.pipe (fs.createwritestream )); }) Isdownmap[imgurl] = 1; } } } }); }); } var URLS = Process.argv[2].split (', '); var CWD = PROCESS.CWD (); Download Resource Urls.foreach (function (URL) {var uo = r_url.parse (URL); var filepath; if (uo.pathname = = ' | | uo.pathname = = ') {filepath = CWD + '/' + uo.hostname + '/index.html ';} else {filepath = CWD + '/' + Uo.hostname + uo.pathname;} makedi R (Path.dirname (filepath), function () {Http.get (URL, function (res) {if (Url.indexof ('. css ')!=-1 | | (res.headers["Content-type"] && res.headers["Content-type"].indexof (' text/css ')!=-1)) {Console.log (' Down images form css file: ' + URL + '. '); downimgfromcss (URL);} Res.pipe (Fs.createwritestream (fIlepath)); }) }); }); |
Down.js Downhtml.js is placed under the same folder and runs through the following CMD
D:phantomjs-2.0.0-windowsbinphantomjs.exe Down.js http://www.youku.com/
The above mentioned is the entire content of this article, I hope you can enjoy.
Note < > : More Wonderful tutorials please focus on Triple programming