Nodejs actual combat Experience Eventproxy Module control concurrency _node.js

Source: Internet
Author: User
Tags emit goto trim variable scope sublime text

Goal

Create a Lesson4 project in which to write code.

The entry to the code is app.js, when the node app.js is invoked, it outputs the title, link, and first comment of all topics in the Cnode (https://cnodejs.org/) Community home page in JSON format.

Output Example:

[
 {
  "title": "Bulletin" to send a recruitment post students to pay attention to here, "
  href": "Http://cnodejs.org/topic/541ed2d05e28155f24676a12",
  " Comment1 ":" "hehe"
 },
 {"
  title": "Publish a Sublime Text JavaScript syntax Highlighting plugin",
  "href": "http://cnodejs.org/ topic/54207e2efffeb6de3d61f68f ","
  comment1 ":" Sofa! " "
 }
]

Challenge

The above objective is based on the output of Comment1 author, as well as his integral values in the Cnode community.

Example:

[
 {
  "title": "Bulletin" to send a recruitment post students pay attention to here, "
  href": "Http://cnodejs.org/topic/541ed2d05e28155f24676a12", "
  comment1": "Hehe hehe",
  "Author1": "Auser",
  "Score1":

Knowledge points

Experience the beauty of Node.js's callback Hell

Learn to use the Eventproxy tool to control concurrent

Course Content

In this chapter we came to the node.js of the most amazing place-the content of asynchronous concurrency.

In the previous lesson, we introduced how to use superagent and cheerio to get the content of the home page, which requires only an HTTP GET request to be launched. But this time, we need to take out the first comment on each topic, which requires us to initiate a request for a link to each topic and use Cheerio to take out the first comment.

Cnode currently has 40 themes per page, so we need to launch 1 + 40 requests to achieve our goal in this lesson.

The latter 40 requests, we are concurrent to initiate:), and will not encounter multithreading AH lock or something, Node.js's concurrency model is different from multiple threads, discard those ideas. To be more specific, such as why Asynchrony is asynchronous, node.js why a single thread can concurrent this kind of approach to science, I do not intend to say. Interested in this area of students, strongly recommend @ Pauling's "Nine Shallow One Deep node.js": http://book.douban.com/subject/25768396/.

Some of the aligning high friends may have heard of concepts such as promise and generator. But I only speak callback, the main reason is that I personally only like callback.

We need three libraries for this course: superagent cheerio eventproxy (https://github.com/JacksonTian/eventproxy)
The work of the hands and feet. Come on, let's get this program together, step-by-step.

First, app.js should look like this.

var eventproxy = require (' Eventproxy ');
var superagent = require (' superagent ');
var cheerio = require (' Cheerio ');
URL module is the Node.js standard library inside the
//http://nodejs.org/api/url.html
var url = require (' URL ');
var cnodeurl = ' https://cnodejs.org/';
Superagent.get (Cnodeurl)
 . End (function (err, res) {
  (err) {return
   console.error (err);
  }
  var topicurls = [];
  var $ = cheerio.load (res.text);
  Get all links to the home page
  $ (' #topic_list. Topic_title '). each (function (idx, Element) {
   var $element = $ (element);
   $element. attr (' href ') was/topic/542acd7d5d28233425538b04
   //We inferred the full URL with url.resolve from the move, and became
   //HTTPS : The form of//cnodejs.org/topic/542acd7d5d28233425538b04
   //specific please see http://nodejs.org/api/url.html#url_url_resolve_ from_to example
   var href = url.resolve (Cnodeurl, $element. attr (' href '));
   Topicurls.push (HREF);
  });
  Console.log (Topicurls);
 });

Run Node App.js

The output is shown below:

OK, that's when we've got the address of all the URLs, and then we're going to crawl through the addresses, and it's done, and Node.js is so simple.
Before you crawl, you have to introduce eventproxy this library.

With JS write asynchronous students should know, if you want to asynchronously get two or three of the address of the data, and to get the data, the use of these data together, the conventional way is to maintain a counter.

First define a var count = 0, and then count++ each time the crawl succeeds. If you are trying to crawl three of sources of data, because you do not know exactly who the asynchronous operations first completed, then each time the capture succeeds, then judge Count = 3. When the value is true, use another function to continue to complete the operation.
And Eventproxy is the role of this counter, it to help you manage whether the asynchronous operation is completed, after the completion, it will automatically call the processing function you provide, and the data to be crawled when the parameters passed.
Assuming we don't use eventproxy or counters, the way to crawl three of sources is this:

A way to refer to the $.get of jquery

$.get ("Http://data1_source", function (data1) {
 //something
 $.get ("Http://data2_source", function (data2) {
  //Something
  $.get ("Http://data3_source", function (data3) {
   //something
   var html = fuck (Data1, data2 , data3);
   Render (HTML);
  });
 };


All of the above code has been written. Get the data1 first, get the data2 after the completion, and then get the data3, then fuck them to output.

But we should also think of, in fact, these three sources of data, can be obtained in parallel, DATA2 acquisition does not rely on data1 completion, DATA3 also does not rely on data2.

So we use the counter to write, and it will be written like this:

(function () {
 var count = 0;
 var result = {};
 $.get (' Http://data1_source ', function (data) {
  result.data1 = data;
  count++;
  Handle ();
  });
 $.get (' Http://data2_source ', function (data) {
  result.data2 = data;
  count++;
  Handle ();
  });
 $.get (' Http://data3_source ', function (data) {
  result.data3 = data;
  count++;
  Handle ();
  });
 function handle () {
  if (count = = 3) {
   var html = fuck (Result.data1, result.data2, result.data3);
   Render (HTML);}}
) ();

Ugly one force, also not ugly, mainly I write code good-looking.

If we use Eventproxy, it's written like this:

var EP = new Eventproxy ();
Ep.all (' data1_event ', ' data2_event ', ' data3_event ', function (Data1, data2, data3) {
 var html = fuck (data1, data2, dat A3);
 Render (HTML);
$.get (' Http://data1_source ', function (data) {
 ep.emit (' data1_event ', data);
 });
$.get (' Http://data2_source ', function (data) {
 ep.emit (' data2_event ', data);
 });
$.get (' Http://data3_source ', function (data) {
 ep.emit (' data3_event ', data);
 });

It's a lot better, it's a high counter.

Ep.all (' data1_event ', ' data2_event ', ' data3_event ', function (Data1, data2, data3) {});

This sentence, listening to three events, respectively, is data1_event, Data2_event, data3_event, each time when a source of data capture complete, through Ep.emit () to tell the EP himself, So-and-so event has been completed.

When three events are not completed at the same time, ep.emit () does nothing after the call, and when three events are complete, the callback function at the end is invoked to unify them.

Eventproxy provides a number of APIs for other scenarios, but the most common usage is this one:

First var EP = new Eventproxy (); Get a Eventproxy instance.

Tell it what events you want to listen to and give it a callback function. Ep.all (' event1 ', ' Event2 ', function (RESULT1, RESULT2) {}).
At the right time Ep.emit (' Event_Name ', eventdata).

Eventproxy the idea of dealing with asynchronous concurrency, I've always felt like a goto statement inside the assembly, where program logic jumps everywhere in the code. The code already executes to 100 lines, and suddenly the 80-line callback function starts working again. If you have an asynchronous logic complex, the 80-line function completes and activates another 60-line function. The concurrency and nesting problems are solved, but the ancestors who wiped out the goto statement for decades have come back.

As for this set of ideas is not bad, I personally think it is not bad, with ripe to look quite clear. But JS this gate slag language is originally disorderly, what variable promotion (http://www.cnblogs.com/damonlan/archive/2012/07/01/2553425.html) Ah, no main function ah, variable scope ah, Data types are often simply numbers, strings, hashes, arrays, and this series of questions is not a problem.
Programming language Beautiful ugly What, I have a Buddha in mind is good.

Back to the point, we've got a topicurls array of length 40, which contains links to each topic. So that means we're going to issue 40 concurrent requests. We need to use the Eventproxy #after API.

Let's learn this API by ourselves: Https://github.com/JacksonTian/eventproxy#%E9%87%8D%E5%A4%8D%E5%BC%82%E6%AD%A5%E5%8D%8F%E4%BD%9C
I just posted the code directly.

After getting topicurls
//Get a Eventproxy instance
var EP = new Eventproxy ();
Command EP repeat Monitor Topicurls.length (here is 40 times) ' topic_html ' event again action
ep.after (' topic_html ', topicurls.length, function (to  pics) {
 //topics is a number that contains 40 ep.emit (' topic_html ', pair) of the 40 pair
 //Start action
 topics = Topics.map (function (Topicpair) {
  //Next is the use of jquery
  var topicurl = topicpair[0];
  var topichtml = topicpair[1];
  var $ = cheerio.load (topichtml);
  Return ({
   title: $ ('. Topic_full_title '). Text (). Trim (),
   Href:topicurl,
   comment1: $ ('. Reply_content ') ). EQ (0). Text (). Trim (),
  }
 ); Console.log (' final: ');
 Console.log (topics);
});
Topicurls.foreach (function (topicurl) {
 superagent.get (topicurl)
  . End (function (err, res) {
   Console.log (' fetch ' + Topicurl + ' successful ');
   Ep.emit (' topic_html ', [Topicurl, Res.text]);
  });

The output length is this:

Complete code Check the App.js file in the Lesson4 directory

Summarize

The Eventproxy module introduced today is to control concurrency, sometimes we need to send N HTTP requests at the same time, and then use the resulting data for later processing work, how to easily judge the data has been all concurrent acquisition, you can use the module. Modules can be used not only on the server side, but also in the client

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.