Grab lofter beautiful pictures with Nodejs & Cheerio & Request

Source: Internet
Author: User
Tags create directory

or refer to this article:

Http://cnodejs.org/topic/54bdaac4514ea9146862abee

In addition, the above article Nodejs grasping some experience of netease open class.

The code is as follows, note that it uses HTTP to get Web page results, request for HTTP requests, Cheerio parsing, mkdirp Create directory, fs create file, Iconv-lite format conversion (This example is not required).

Curl.js:

/***/= require ("http"), function download (URL, callback)    {= [];    Http.get (URL, function (res) {        res.on (' data ', function (chunk) {            chunks.push (chunk);        });        Res.on (' End ', function () {            callback (chunks);        });    }). On (' error ', function () {        callback (chunks);     = download;

Saveimage.js

/***/= require (' fs' = require (' request' = function (URL, FileName) {    console.log (' image=> ' + URL);    Request (URL). PIPE (Fs.createwritestream (filename));    Console.log (' save=> ' += saveimage;

Helloworld.js

/*** Created by Baidu on 16/10/17.*/Console.log ("Hello World"); var cheerio= Require (' Cheerio '); var curl= require ('./curl ')); var iconv= Require (' Iconv-lite '); var mkdirp= Require (' MKDIRP '); var saveimage= require ('./saveimage '));//var url = 'http://open.163.com/special/opencourse/englishs1.html';var url = ' http://loftermeirenzhi.lofter.com/tag/%E4%BA%BA%E5%83%8F?page= '; var dir= './images '; Mkdirp (dir, function (err) {if(Err) {Console.log (err); }}); Curl.download (URL, function (chunks) {if(chunks) {var data= Iconv.decode (Buffer.concat (chunks), ' GBK '); var $=cheerio.load (data); $(' A.img '). Each (function (I, e) {var item= $ (E). Children (' img '). Last (). attr (' src '); Saveimage.saveimage (item, dir+ '/' + item.substr (item.indexof ('. jpg ')-10, 14));        }); Console.log (' Done '); }    Else{Console.log (' Error '); }});

After running, it is found that basically the downloaded picture files are empty.

Look at the example, the request part of the saveimage.js made some changes, as follows:

/***/= require (' fs' = require (' request' = function (URL, FileName) {    console.log (' image=> ' + URL);    request.head (URL, function (err, res, body) {        request (URL). PIPE (Fs.createwritestream (filename));    });    console.log (' save=> ' += saveimage;

Then run, success, print:

/usr/local/bin/node/users/baidu/documents/data/work/code/self/nodejs/helloworld/HelloWorld.jsHello Worldimage=>http://imgsize.ph.126.net/?imgurl=http://img2.ph.126.net/ Cil5iulfm0ttzbjxnhcfqq==/52072870709354180.jpg_110x110x0x90.jpgsave=>./images/0709354180. jpgImage =>http://imglf1.nosdn.127.net/img/ szzqcdg4rk01vgo5cw81teortu5zl2dcbjblbktbodlcskfgsxlidew5defvsdlgatnjzmj3pt0.jpg?imageview&thumbnail=500x0 &quality=96&stripmeta=0&type=jpgsave=>./images/tnjzmj3pt0.jpg... done

Then the project directory, generated the images directory, which has beautiful pictures:

The above changes can be effective, not particularly clear. (head is generally used to determine if a URL is valid.) )

Added head success, also may be because the first picture although not downloaded successfully, but has started the download, did the cache. Experiment, after a successful one, remove the head command:

Request.head (URL, function (err, res, body) {        request (URL). PIPE (Fs.createwritestream (filename));//});

Discover or be able to succeed. So there is a great possibility that the picture loading delay is caused.

When there is time, to see, how to avoid the download timeout caused by the downloading of the problem, there is no setting time-out place.

It seems that when the request is initialized, you can set:

request ({    url:jurl,    true,    timeout:xxx  })

Then learn some of the Javascript Request and the content of the rendering. Especially the way PHANTOMJS renders dynamic Web pages.

Grab lofter beautiful pictures with Nodejs & Cheerio & Request

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.