- Where Express is the service-side framework
- Request is equivalent to the front-end AJAX requests
- Cheerio equivalent to JQ
StartFirst, we'll start with a new crawler directory execution.npm Install express-gCommands andnpm Install express-generator-gCommand ThenCD crawlerGo to crawler directory executionnpm Install request--save-devAndnpm Install Cheerio--save-devAnd then build the Express project in our directory direct command line execution Expressok our project directory to look like this: Next we first install the project dependencies, executeNPM InstallOK, then we'll do our pre-work.
thenWe open the app.js and we'll modify him. As follows:
1 varExpress = require (' Express ');2 varApp =Express ();3 4App.get ('/',function(req, res) {5Res.send (' Hello Express ');6 });7 8App.listen (3000,function() {9Console.log (' Listening on 3000 ');Ten});
Terminal executionSupervisor App.js(Note: Supervisor is used for monitoring processes in the Nodejs.) For example, we have modified App.js so supervisor will automatically restart the file, do not need us to manually go to node app.js, the user can be installed via NPM install SUPERVISOR-G. This is also one of the tools we used in NODEJS development) OK. We opened 127.0.0.1:3000 to see the output of Hello Express on the page. Everything's fine. Let's look at the request. We went to NPM inside the request's official website Https://www.npmjs.com/package/request see about his use, we put it down: Modify our App.js
1 varExpress = require (' Express ');2 varApp =Express ();3 varRequest = require (' request '));4 5App.get ('/',function(req, res) {6Request (' Http://www.cnblogs.com/galenyip ',function(Error, response, body) {7 if(!error && Response.statuscode = = 200) {8Console.log (body);//Show The HTML for the Google homepage.9Res.send (' Hello Express ');Ten } One }); A }); - -App.listen (3000,function() { theConsole.log (' Listening on 3000 '); -});
Change the address to my blog address. To crawl me this blog
OK, refresh our page. Wait a minute, you will see the terminal print out the HTML related information.
Then
We'll use Cheerio.
In App.js we enter var Cherrio = require (' Cherrio ');
1 varExpress = require (' Express ');2 varApp =Express ();3 varRequest = require (' request '));4 varCheerio = require (' Cheerio '));5 6App.get ('/',function(req, res) {7Request (' Http://www.cnblogs.com/galenyip ',function(Error, response, body) {8 if(!error && Response.statuscode = = 200) {9$ = cheerio.load (body);//get body, as selectorTen } One }); A }); - -App.listen (3000,function() { theConsole.log (' Listening on 3000 '); -});
At the same time, we can see that cheerio.load (body) is the page we get, which is the total selector.
After that, we can operate this page as we do with JQ.
Specific APIs can be reached on the website:
Https://www.npmjs.com/package/cheerio
Its API is similar to JQ, so this is not an introduction.
In fact, our whole reptile is almost there.
The rest is the audience according to their own needs, to crawl the page of the DOM, screening and so on ....
All right.
That's all that much.
If you do not understand, or do not know, you can in the comments inside the exchange of bricks.
Nodejs Crawler System