This article is reproduced from:
How to use Nodejs to analyze a simple page----------Han Zi later in this article
In the Browser address field, enter the localhost:3000
20 article title on the page to display the blog home.
Process Analysis
The first step is to listen to the port, which requires the introduction of one of the most important modules in Node express
. Next, we need to send a HTTP-like request to the http://www.cnblogs.com/page to get the page data for analysis, which requires the introduction of SuperAgent
modules. Finally, in order to make the requested HTML source code similar to DOM operations, we need to introduce cheerio
modules.
Express Module
First step, we want to implement the port monitoring, so that the information can be output to the page.
We can use http
modules:
Copy Codevar http = require("http");http.createServer(function(request, response) { response.writeHead(200, {"Content-Type": "text/html"}); response.write("Hello World!"); response.end();}).listen(3000);
Of course we can also use the http
more powerful express modules in the package:
Copy CodeThe meaning of this sentence is to introduce the ' express ' module and give it the ' express ' variable for use.var Express =Require' Express ');Call the Express instance, which is a function that, when called without arguments, returns an express instance that assigns this variable to the app variable.var app = Express ();The app itself has many methods, including the most commonly used get, post, Put/patch, delete, where we call the Get method and specify a handler function for our '/' path. //this handler function will receive Req and res two objects, which are requested request and response respectively. //request contains a variety of information from the browser, such as query Ah, body ah, headers Ah, etc., can be accessed through the Req object. //res object, we generally do not take information from the inside, but through it to customize the information we output to the browser, such as header information, such as the content to be exported to the browser. Here we call its #send method, outputting a string to the browser. App.get ( "/", function (req, res) {res.send ( ' Hello world ');}); //after defining the behavior of our app, let it listen to the local 3000 port. The second function here is a callback function that executes after the listen action succeeds, and here we execute a command-line output that tells us that the listener action is complete. App.listen (3000, function ( Span class= "Hljs-params") {console.log ( ' app is listening at Port ");
SuperAgent Module
Then we need to get the code of the home page of the blog to analyze it. SuperAgent is the server can send HTTP requests such as Get post module, directly look at the code, more APIs can refer to the document.
Copy Codevar Express =Require' Express ');var superagent =require ( ' superagent '); var app = Express (), App.get ( "/", function (req, res, next) {superagent. Get (function ( err, ans) {//General error handling if (err) {return Next ( ERR); } res.send (Ans.text); });}); App.listen (3000, function ( Span class= "Hljs-params") {console.log ( ' app is listening at Port ");
Cheerio Module
We use the SuperAgent module to get the page code of the blog park, while the cheerio can be used to do Jquery-like CSS selectors. The detailed API for the Cheerio module can be referenced in the documentation.
Full code
Copy Codevar Express =Require' Express ');var cheerio =Require' Cheerio ');var superagent =Require' superagent ');var app = Express (); App.get (‘/‘,function (Req, res, next) {superagent. Get (' http://www.cnblogs.com/'). End (function (Err, Sres) {CallbackGeneral error Handlingif (err) {Return next (ERR); }Sres.text stores the HTML content of the Web page and passes it to Cheerio.load//can get a variable that implements the JQuery interface, and we habitually name it ' $ ' //the rest is jquery content. Span class= "Hljs-keyword" >var $ = cheerio.load (Sres.text); var ans = "; $ (". Titlelnk '). each ( Span class= "hljs-function" >function (index, item) {var $item = $ (item); Ans + = $item. HTML () + ' <br/><br/> ';}); //content rendered to page res.send (ANS);}); App.listen (3000, function ( Span class= "Hljs-params") {console.log ( ' app is listening at Port ");
At this point localhost:3000
, the page will be able to display the blog home 20 article title (although no style is not very beautiful)! (Of course first node Filename.js)
Of course, we can also export the content directly to the console:
Copy Codevar cheerio =Require' Cheerio ');var superagent =Require' superagent '); superagent. Get ( function (err, sres) {//callback Span class= "hljs-comment" >//General error handling if (err) {console.error (err); } //sres.text the HTML content of the Web page, passed it to cheerio.load // You can get a variable that implements the JQuery interface, and we habitually name it ' $ ' //the rest is jquery content var $ = cheerio.load (Sres.text); $ ( Titlelnk '). Each (function ( index, item) {var $item = $ (item); console.log ($item. Text ());}); });
Reference: "node. js package teaches not to pack"
This article is reproduced from:
How to use Nodejs to analyze a simple page----------Han Zi late
Nodejs Reptile Foundation (i)