Node.js to crawl and analyze the content of the Web page has no special content JS file

Node.js to crawl and analyze the content of the Web page has no special content JS file _node.js

Last Update:2017-01-18 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Nodejs get Web page content binding data events, get the number will be divided several times, if you want to match the global content, need to wait for the end of the request, in the end of the event to the cumulative global data to operate!

For example, for example, to find in the page there is no www.baidu.com, do not say, directly put the code:

Introducing module var HTTP = require ("http"), FS = require (' fs '), url = require (' URL '); Writes a file, writes the result to a different file var writeres = function (p, r) {Fs.appendfile (p, r, function (ERR) {if (err) Console.log (E
    RR);
  else Console.log (R);
}); },//Send the request, and verify the content, write the result to the file Posthttp = function (arr, num) {console.log (' +num+ ' section!)
   ") var a = Arr[num].split ("-");
   if (!a[0] | | |!a[1]) {return; var address = Url.parse (a[1]), options = {host:address.host, Path:address.path, hostname:addr Ess.hostname, method: ' Get ', headers: {' user-agent ': ' mozilla/5.0 (Windows NT 6.1; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/38.0.2125.122 safari/537.36 '}} var req = Http.request (opt
        Ions, function (res) {if (Res.statuscode = =) {res.setencoding (' UTF-8 ');
        var data = ';
        Res.on (' Data ', function (RD) {data + + rd;
        }); Res.on (' end '), function (q) {if (!~data.indexof ("WWw.baidu.com ")) {return Writeres ('./no2.txt ', a[0] + '--' + a[1 ' + ' \ n ');
          else {return writeres ('./has2.txt ', a[0] + '--' + a[1 ' + ' \ n ');
     }} else {Writeres ('./error2.txt ', a[0] + '--' + a[1] + '--' + Res.statuscode + ' \ n ');
   }
   });
   Req.on (' Error ', function (e) {writeres ('./error2.txt ', a[0] + '--' + a[1] + '--' + e + ' \ n ');
}) Req.end (); //Read the file to get the page that needs to be crawled OpenFile = function (path, coding) {Fs.readfile (path, coding, function (err, data) {var res =  
     Data.split ("\ n");
        for (var i = 0, RL = Res.length i &lt; RL; i++) {if (!res[i)) continue;  
     Posthttp (res, i);  
   };
})
}; OpenFile ('./sites.log ', ' utf-8 ');

The above code you can see understand, there are no clear friends welcome to my message, but also rely on everyone to play the application into practice.

Here is a brief introduction to Nodejs's ability to crawl

First PHP. First of all, the advantage: the Internet to crawl and parse the framework of HTML to catch a lot of tools directly to use on the line, more worry. Disadvantages: First of all, speed/efficiency is a problem, there is a time to download movie posters, because it is crontab regular execution, also did not do optimization, open the PHP process too much, directly to the memory burst. Then the grammatical aspect is also very procrastination, each kind of key word symbol too many, is not concise, gives the person to have not been careful design the feeling, writes is very troublesome.

Node.js. The advantage is efficiency, efficiency or efficiency, because the network is asynchronous, so basically like hundreds of processes concurrency is as powerful, memory and CPU footprint is very small, if there is no data on the processing of complex operations, then the bottleneck of the system is basically in the bandwidth and write MySQL database I/O speed. Of course, the opposite of the advantages is also a disadvantage, the asynchronous network means that you need to callback, this time if the business needs are linear, such as must wait for the last page crawl completed, to get the data, in order to crawl the next page, or even multi-layer dependencies, it will appear terrible multi-layer callback! Basically this time, the code structure and logic will mess. Of course, you can use step and other process control tools to solve these problems.

Finally, Python. If you do not have extreme requirements for efficiency, then recommend the use of python! First of all, Python's syntax is very concise, the same sentence, you can knock down many times the keyboard. Then, Python is very suitable for data processing, such as the package of function parameters unpack, list resolution, matrix processing, very convenient.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Node.js to crawl and analyze the content of the Web page has no special content JS file _node.js

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Node.js to crawl and analyze the content of the Web page has no special content JS file _node.js

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support