Nodejs Crawler System

Source: Internet
Author: User

    • Where Express is the service-side framework
    • Request is equivalent to the front-end AJAX requests
    • Cheerio equivalent to JQ

StartFirst, we'll start with a new crawler directory execution.npm Install express-gCommands andnpm Install express-generator-gCommand ThenCD crawlerGo to crawler directory executionnpm Install request--save-devAndnpm Install Cheerio--save-devAnd then build the Express project in our directory direct command line execution Expressok our project directory to look like this: Next we first install the project dependencies, executeNPM InstallOK, then we'll do our pre-work. thenWe open the app.js and we'll modify him. As follows:
1 varExpress = require (' Express ');2 varApp =Express ();3 4App.get ('/',function(req, res) {5Res.send (' Hello Express ');6 });7 8App.listen (3000,function() {9Console.log (' Listening on 3000 ');Ten});
Terminal executionSupervisor App.js(Note: Supervisor is used for monitoring processes in the Nodejs.) For example, we have modified App.js so supervisor will automatically restart the file, do not need us to manually go to node app.js, the user can be installed via NPM install SUPERVISOR-G. This is also one of the tools we used in NODEJS development) OK. We opened 127.0.0.1:3000 to see the output of Hello Express on the page. Everything's fine. Let's look at the request. We went to NPM inside the request's official website Https://www.npmjs.com/package/request see about his use, we put it down: Modify our App.js
1 varExpress = require (' Express ');2 varApp =Express ();3 varRequest = require (' request '));4 5App.get ('/',function(req, res) {6Request (' Http://www.cnblogs.com/galenyip ',function(Error, response, body) {7     if(!error && Response.statuscode = = 200) {8Console.log (body);//Show The HTML for the Google homepage.9Res.send (' Hello Express ');Ten     } One   }); A }); -  -App.listen (3000,function() { theConsole.log (' Listening on 3000 '); -});

Change the address to my blog address. To crawl me this blog

OK, refresh our page. Wait a minute, you will see the terminal print out the HTML related information.

Then

We'll use Cheerio.

In App.js we enter var Cherrio = require (' Cherrio ');

1 varExpress = require (' Express ');2 varApp =Express ();3 varRequest = require (' request '));4 varCheerio = require (' Cheerio '));5 6App.get ('/',function(req, res) {7Request (' Http://www.cnblogs.com/galenyip ',function(Error, response, body) {8     if(!error && Response.statuscode = = 200) {9$ = cheerio.load (body);//get body, as selectorTen     } One   }); A }); -  -App.listen (3000,function() { theConsole.log (' Listening on 3000 '); -});

At the same time, we can see that cheerio.load (body) is the page we get, which is the total selector.

After that, we can operate this page as we do with JQ.

Specific APIs can be reached on the website:

Https://www.npmjs.com/package/cheerio

Its API is similar to JQ, so this is not an introduction.

In fact, our whole reptile is almost there.

The rest is the audience according to their own needs, to crawl the page of the DOM, screening and so on ....

All right.

That's all that much.

If you do not understand, or do not know, you can in the comments inside the exchange of bricks.

Nodejs Crawler System

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.