or refer to this article:
Http://cnodejs.org/topic/54bdaac4514ea9146862abee
In addition, the above article Nodejs grasping some experience of netease open class.
The code is as follows, note that it uses HTTP to get Web page results, request for HTTP requests, Cheerio parsing, mkdirp Create directory, fs create file, Iconv-lite format conversion (This example is not required).
Curl.js:
/***/= require ("http"), function download (URL, callback) {= []; Http.get (URL, function (res) { res.on (' data ', function (chunk) { chunks.push (chunk); }); Res.on (' End ', function () { callback (chunks); }); }). On (' error ', function () { callback (chunks); = download;
Saveimage.js
/***/= require (' fs' = require (' request' = function (URL, FileName) { console.log (' image=> ' + URL); Request (URL). PIPE (Fs.createwritestream (filename)); Console.log (' save=> ' += saveimage;
Helloworld.js
/*** Created by Baidu on 16/10/17.*/Console.log ("Hello World"); var cheerio= Require (' Cheerio '); var curl= require ('./curl ')); var iconv= Require (' Iconv-lite '); var mkdirp= Require (' MKDIRP '); var saveimage= require ('./saveimage '));//var url = 'http://open.163.com/special/opencourse/englishs1.html';var url = ' http://loftermeirenzhi.lofter.com/tag/%E4%BA%BA%E5%83%8F?page= '; var dir= './images '; Mkdirp (dir, function (err) {if(Err) {Console.log (err); }}); Curl.download (URL, function (chunks) {if(chunks) {var data= Iconv.decode (Buffer.concat (chunks), ' GBK '); var $=cheerio.load (data); $(' A.img '). Each (function (I, e) {var item= $ (E). Children (' img '). Last (). attr (' src '); Saveimage.saveimage (item, dir+ '/' + item.substr (item.indexof ('. jpg ')-10, 14)); }); Console.log (' Done '); } Else{Console.log (' Error '); }});
After running, it is found that basically the downloaded picture files are empty.
Look at the example, the request part of the saveimage.js made some changes, as follows:
/***/= require (' fs' = require (' request' = function (URL, FileName) { console.log (' image=> ' + URL); request.head (URL, function (err, res, body) { request (URL). PIPE (Fs.createwritestream (filename)); }); console.log (' save=> ' += saveimage;
Then run, success, print:
/usr/local/bin/node/users/baidu/documents/data/work/code/self/nodejs/helloworld/HelloWorld.jsHello Worldimage=>http://imgsize.ph.126.net/?imgurl=http://img2.ph.126.net/ Cil5iulfm0ttzbjxnhcfqq==/52072870709354180.jpg_110x110x0x90.jpgsave=>./images/0709354180. jpgImage =>http://imglf1.nosdn.127.net/img/ szzqcdg4rk01vgo5cw81teortu5zl2dcbjblbktbodlcskfgsxlidew5defvsdlgatnjzmj3pt0.jpg?imageview&thumbnail=500x0 &quality=96&stripmeta=0&type=jpgsave=>./images/tnjzmj3pt0.jpg... done
Then the project directory, generated the images directory, which has beautiful pictures:
The above changes can be effective, not particularly clear. (head is generally used to determine if a URL is valid.) )
Added head success, also may be because the first picture although not downloaded successfully, but has started the download, did the cache. Experiment, after a successful one, remove the head command:
Request.head (URL, function (err, res, body) { request (URL). PIPE (Fs.createwritestream (filename));//});
Discover or be able to succeed. So there is a great possibility that the picture loading delay is caused.
When there is time, to see, how to avoid the download timeout caused by the downloading of the problem, there is no setting time-out place.
It seems that when the request is initialized, you can set:
request ({ url:jurl, true, timeout:xxx })
Then learn some of the Javascript Request and the content of the rendering. Especially the way PHANTOMJS renders dynamic Web pages.
Grab lofter beautiful pictures with Nodejs & Cheerio & Request