HTTP crawler, Nodejs Learning (ii)

Source: Internet
Author: User

Use Nodejs crawl Web page data, here used to Cheerio, parsing HTML is very useful, and jquery usage is exactly the same.

First install Cheerio, enter NPM install Cheerio on the command line, (enter this command in the Nodejs root directory)

After the installation is complete, we will analyze the http://www.imooc.com/learn/348 online and get the course information on it.

The code is as follows:

varHTTP = require (' http ');varCheerio = require (' Cheerio '));varurl = ' http://www.imooc.com/learn/348 ';functionFilter (HTML) {//Grab the required course Information    var$ =cheerio.load (HTML); varChapters = $ ('. Chapter ')); //var result = [{chaptertitle: ', Videos:[title: ', ID: ']}]; The format of the fetch result    varresult = []; Chapters.each (function(){        varItem = $ ( This); varChaptertitle = Item.find (' strong ')). text (); varVideos = Item.find ('. Video '). Find (' Li '); varChapterdata ={chaptertitle:chaptertitle, videos: []}; Videos.each (function(){            varVideo = $ ( This). Find ('. Studyvideo ')); vartitle = Video.text (). Split (') ') [0] + ') ';//remove the whitespace behind useful information            //Console.log (title);            varid = video.attr (' href '). Split (' video/') [1];//take the number of course videos onlyChapterData.videos.push ({title:title, id:id});        });    Result.push (Chapterdata);    }); returnresult;}functionPrintresult (Result) {//Print Crawl Results    varstr = "; Result.foreach (function(item) {str+ = Item.chaptertitle + ' \ n '; Item.videos.forEach (function(item) {str+ = ' + ' + item.id + ' + ' + item.title + ' \ n ';    });    }); Console.log (str);} Http.get (URL,function(res) {varhtml = ' '; Res.on (' Data ',function(data) {//get all the information for the entire pageHTML + =data;    }); Res.on (' End ',function(){        varresult = filter (HTML);//Filter the page to capture the required course informationPrintresult (result);//results of the print crawl    }); }). On (' Error ',function(){//An error occurred while getting page informationConsole.log (' error! ');});

Results:

The 1th Chapter Preface
"6687" 1-1 preface (01:20)
"6688" 1-2 Why Study Nodejs (05:43)
2nd Chapter Installation Nodejs
"6689" 2-1 course Brief (01:19)
"6690" 2-2 Nodejs version Common sense (01:02)
"6691" 2-3 Windows installation Nodejs (04:43)
"6692" 2-4 Linux installation Nodejs (06:24)
"6693" 2-5 mac installation Nodejs (03:55)
The 3rd chapter can't wait to come to the early adopters
"6694" 3-1 from a Web server (05:14)
"6695" 3-2 command-line experience (02:47)
4th Chapter module and package management tools
"6697" 4-1 node. JS module and COMMONJS specification (03:44)
Classification of "6700" 4-2 modules (00:45)
"6701" 4-3 simple Nodejs module (09:23)
5th Chapter Sweep Nodejs API
"6705" 5-1 do not fall into the abyss of version selection (02:32)
"6710" 5-2 URL Parsing good helper (10:30)
"6711" 5-3 querystring Parameter Processing Small weapon (06:40)
"6712" 5-4 http Knowledge First pits (09:43)
"6713" 5-5 http Knowledge Pits "to MU class network for example Analysis" (10:13)
"7557" 5-6 HTTP event back to Redeployment order (17:51)
"7558" 5-7 HTTP Source Interpretation first understand scope, context (20:50)
"7963" 5-8 HTTP Source code interpretation (22:08)
"7964" 5-9 HTTP performance Test (09:15)
"7965" 5-10 HTTP crawler (17:33)
"8525" 5-11 Event module episode (15:15)
"8837" 5-12 Request method (17:56)

This article was published after learning the courses online, website: http://www.imooc.com/learn/348

HTTP crawler, Nodejs Learning (ii)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.