Storm series (II): Use Csharp to create your first Storm topology (wordcount), csharpwordcount

Source: Internet
Author: User

Storm series (II): Use Csharp to create your first Storm topology (wordcount), csharpwordcount

WordCount is like hello world when learning a language in the big data field, thanks to Storm's open source and Storm. net. adapter. Now we can use Csharp to create Storm Topologies that are native support, just like Java or Python. Next I will introduce wordcount to demonstrate how to use Csharp to develop Storm topology.

The previous blog has introduced how to deploy the Storm development environment. The demo described in this article is included in Storm. net. in the Adapter, if you think it is helpful to you, welcome Star and Fork and let more people see it to help improve this project.

First, create a console application (which can be conveniently called by the console) StormSimple. Use Nuget to add Storm. Net. Adapter (The namespace of this class library is Storm ).

Step 1: Create a Spout: Generator by inheriting ISpout to implement four ISpout methods:

void Open(Config stormConf, TopologyContext context);void NextTuple();void Ack(long seqId);void Fail(long seqId);

 

Before implementing these four methods, we also need to create some variables and methods to initialize this class:

private Context ctx;public Generator(Context ctx){    Context.Logger.Info("Generator constructor called");    this.ctx = ctx;    // Declare Output schema    Dictionary<string, List<Type>> outputSchema = new Dictionary<string, List<Type>>();    outputSchema.Add("default", new List<Type>() { typeof(string) });    this.ctx.DeclareComponentSchema(new ComponentStreamSchema(null, outputSchema));}

 

  

I used a private variable.ctxTo save the Context object passed in during instantiation. The Context has a static Logger for log sending. We can use it without instantiating it. According to the log level, there are five levels of Trace Debug Info Warn Error. In addition, we also need to define the quantity and type of input and output parameters in the instantiation method. In this example, the input isnullIs output as a string. In addition, we also create a method to directly return the instantiated class:

/// <summary>///  Implements of delegate "newPlugin", which is used to create a instance of this spout/bolt/// </summary>/// <param name="ctx">Context instance</param>/// <returns></returns>public static Generator Get(Context ctx){    return new Generator(ctx);}

 

Among them, Open is executed before the first call of this type of task. It is mainly used for preprocessing and some configuration information input. In most cases, we do not need to do anything. The NextTuple method is used to generate Tuple, will be called continuously, so if there are no tasks to send down, you can useThread.Sleep(50);To reduce CPU consumption (the specific rest time is related to the Topology settings, as long as the timeout time is not exceeded ).

In this example, NextTuple is used to randomly extract a sentence from an array containing English sentences and send it to the next link. To ensure that all tasks are successfully executed, we cache the sent messages and limit the number of tasks being executed to 20.

private const int MAX_PENDING_TUPLE_NUM = 20;private long lastSeqId = 0;private Dictionary<long, string> cachedTuples = new Dictionary<long, string>();private Random rand = new Random();string[] sentences = new string[] {                                  "the cow jumped over the moon",                                  "an apple a day keeps the doctor away",                                  "four score and seven years ago",                                  "snow white and the seven dwarfs",                                  "i am at two with nature"};/// <summary>/// This method is used to emit one or more tuples. If there is nothing to emit, this method should return without emitting anything. /// It should be noted that NextTuple(), Ack(), and Fail() are all called in a tight loop in a single thread in C# process. /// When there are no tuples to emit, it is courteous to have NextTuple sleep for a short amount of time (such as 10 milliseconds), so as not to waste too much CPU./// </summary>public void NextTuple(){    Context.Logger.Info("NextTuple enter");    string sentence;    if (cachedTuples.Count <= MAX_PENDING_TUPLE_NUM)    {        lastSeqId++;        sentence = sentences[rand.Next(0, sentences.Length - 1)];        Context.Logger.Info("Generator Emit: {0}, seqId: {1}", sentence, lastSeqId);        this.ctx.Emit("default", new List<object>() { sentence }, lastSeqId);        cachedTuples[lastSeqId] = sentence;    }    else    {        // if have nothing to emit, then sleep for a little while to release CPU        Thread.Sleep(50);    }    Context.Logger.Info("cached tuple num: {0}", cachedTuples.Count);    Context.Logger.Info("Generator NextTx exit");}

 

this.ctx.EmitIt is used to send the Topology to the next Bolt.

The Ack () and Fail () methods are called when the entire Topology is successfully executed and the Topology fails. In this example, Ack is used to remove the cache. Fail is used to retrieve the cached data and resend Tuple.

/// <summary>/// Ack() will be called only when ack mechanism is enabled in spec file./// If ack is not supported in non-transactional topology, the Ack() can be left as empty function. /// </summary>/// <param name="seqId">Sequence Id of the tuple which is acked.</param>public void Ack(long seqId){    Context.Logger.Info("Ack, seqId: {0}", seqId);    bool result = cachedTuples.Remove(seqId);    if (!result)    {        Context.Logger.Warn("Ack(), remove cached tuple for seqId {0} fail!", seqId);    }}/// <summary>/// Fail() will be called only when ack mechanism is enabled in spec file. /// If ack is not supported in non-transactional topology, the Fail() can be left as empty function./// </summary>/// <param name="seqId">Sequence Id of the tuple which is failed.</param>public void Fail(long seqId){    Context.Logger.Info("Fail, seqId: {0}", seqId);    if (cachedTuples.ContainsKey(seqId))    {        string sentence = cachedTuples[seqId];        Context.Logger.Info("Re-Emit: {0}, seqId: {1}", sentence, seqId);        this.ctx.Emit("default", new List<object>() { sentence }, seqId);    }    else    {        Context.Logger.Warn("Fail(), can't find cached tuple for seqId {0}!", seqId);    }}

 

So far, even if a Spout is complete, we will continue to analyze Bolt.

Step 2: Create Bolts: Splitter and Counter by inheriting IBasicBolt.

Splitter is a space used to split English sentences into independent words. Counter is used to count the number of times each word appears. We only analyze Splitter in detail, and the Counter class only posts all source code.

Like Generator, we must first construct an instantiation method to facilitate parameter passing and calling:

private Context ctx;private int msgTimeoutSecs;public Splitter(Context ctx){    Context.Logger.Info("Splitter constructor called");    this.ctx = ctx;    // Declare Input and Output schemas    Dictionary<string, List<Type>> inputSchema = new Dictionary<string, List<Type>>();    inputSchema.Add("default", new List<Type>() { typeof(string) });    Dictionary<string, List<Type>> outputSchema = new Dictionary<string, List<Type>>();    outputSchema.Add("default", new List<Type>() { typeof(string), typeof(char) });    this.ctx.DeclareComponentSchema(new ComponentStreamSchema(inputSchema, outputSchema));    // Demo how to get stormConf info    if (Context.Config.StormConf.ContainsKey("topology.message.timeout.secs"))    {        msgTimeoutSecs = Convert.ToInt32(Context.Config.StormConf["topology.message.timeout.secs"]);    }    Context.Logger.Info("msgTimeoutSecs: {0}", msgTimeoutSecs);}/// <summary>///  Implements of delegate "newPlugin", which is used to create a instance of this spout/bolt/// </summary>/// <param name="ctx">Context instance</param>/// <returns></returns>public static Splitter Get(Context ctx){    return new Splitter(ctx);}

 

In this instantiation method, we added an unused variable.msgTimeoutSecsIt is used to demonstrate how to obtain the Topology configuration.

Because IBasicBolt is inherited, we need to implement the following two methods:

void Prepare(Config stormConf, TopologyContext context);void Execute(StormTuple tuple);

 

This is the same as IBolt. The difference between IBasicBolt and IBolt is that the latter only needs to handle when Ack or Fail is sent to Storm, while IBasicBolt does not need to care about this, if your Execute does not throw an exception, it will always send Ack to Storm at the end; otherwise, it will send Fail. Prepare is used for preprocessing before execution. In this example, nothing needs to be done.

/// <summary>/// The Execute() function will be called, when a new tuple is available./// </summary>/// <param name="tuple"></param>public void Execute(StormTuple tuple){    Context.Logger.Info("Execute enter");    string sentence = tuple.GetString(0);    foreach (string word in sentence.Split(' '))    {        Context.Logger.Info("Splitter Emit: {0}", word);        this.ctx.Emit("default", new List<StormTuple> { tuple }, new List<object> { word, word[0] });    }    Context.Logger.Info("Splitter Execute exit");}public void Prepare(Config stormConf, TopologyContext context){    return;}

 

Counter is similar to the above Code:

using Storm;using System;using System.Collections.Generic;namespace StormSample{    /// <summary>    /// The bolt "counter" uses a dictionary to record the occurrence number of each word.    /// </summary>    public class Counter : IBasicBolt    {        private Context ctx;        private Dictionary<string, int> counts = new Dictionary<string, int>();        public Counter(Context ctx)        {            Context.Logger.Info("Counter constructor called");            this.ctx = ctx;            // Declare Input and Output schemas            Dictionary<string, List<Type>> inputSchema = new Dictionary<string, List<Type>>();            inputSchema.Add("default", new List<Type>() { typeof(string), typeof(char) });            Dictionary<string, List<Type>> outputSchema = new Dictionary<string, List<Type>>();            outputSchema.Add("default", new List<Type>() { typeof(string), typeof(int) });            this.ctx.DeclareComponentSchema(new ComponentStreamSchema(inputSchema, outputSchema));        }        /// <summary>        /// The Execute() function will be called, when a new tuple is available.        /// </summary>        /// <param name="tuple"></param>        public void Execute(StormTuple tuple)        {            Context.Logger.Info("Execute enter");            string word = tuple.GetString(0);            int count = counts.ContainsKey(word) ? counts[word] : 0;            count++;            counts[word] = count;            Context.Logger.Info("Counter Emit: {0}, count: {1}", word, count);            this.ctx.Emit("default", new List<StormTuple> { tuple }, new List<object> { word, count });            Context.Logger.Info("Counter Execute exit");        }        /// <summary>        ///  Implements of delegate "newPlugin", which is used to create a instance of this spout/bolt        /// </summary>        /// <param name="ctx">Context instance</param>        /// <returns></returns>        public static Counter Get(Context ctx)        {            return new Counter(ctx);        }        public void Prepare(Config stormConf, TopologyContext context)        {            return;        }    }}

 

Step 3: Modify Program. cs to facilitate Java calls.

using Storm;using System;using System.Linq;namespace StormSample{    class Program    {        static void Main(string[] args)        {            if (args.Count() > 0)            {                string compName = args[0];                try                {                    if ("generator".Equals(compName))                    {                        ApacheStorm.LaunchPlugin(new newPlugin(Generator.Get));                    }                    else if ("splitter".Equals(compName))                    {                        ApacheStorm.LaunchPlugin(new newPlugin(Splitter.Get));                    }                    else if ("counter".Equals(compName))                    {                        ApacheStorm.LaunchPlugin(new newPlugin(Counter.Get));                    }                    else                    {                        throw new Exception(string.Format("unexpected compName: {0}", compName));                    }                }                catch (Exception ex)                {                    Context.Logger.Error(ex.ToString());                }            }            else            {                Context.Logger.Error("Not support local model.");            }        }    }}

 

We use parameters in the Main method to determine which Spout/Bolt is called. ApacheStorm is a class that contains the Main method. Storm is not used because the namespace occupies it. The Code on the Csharp end is all over. The code and deployment on the Java end will be detailed in the next article, so stay tuned! Let's take a look at the entire Topology process!

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.