Kafka + storm

Last Update:2014-08-06 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Due to project requirements, storm has been pre-developed recently. There are many installation and usage examples on the Internet. Please record them here and forget it.

I. Introduction to storm

Storm terms include stream, spout, Bolt, task, worker, stream grouping, and topology. Stream is the data to be processed. Sprout is the data source. Bolts process data. A task is a thread running in spout or bolt. Worker is the process that runs these threads. Stream grouping specifies what bolts receive as input data. Data can be randomly allocated (the term is shuffle), allocated based on field values (the term is fields), broadcast (the term is all), or always sent to a task (the term is global ), you can also ignore the data (the term is none), or decide by the custom logic (the term is direct ). Topology is the spout and bolt node networks connected by stream grouping. These terms are described in more detail on the storm concepts page.

To run a storm cluster, you need Apache zookeeper, ø MQ, jzmq, Java 6, and Python 2.6.6. Zookeeper is used to manage different components in the cluster. MQ is an internal message system and jzmq is the Java of MQ.

Installation Details: http://blog.csdn.net/qiyating0808/article/details/36041299

Start a storm cluster:

storm nimbus >/dev/null 2>&1 &storm supervisor >/dev/null 2>&1 &storm ui >/dev/null 2>&1 &

Topology task scheduling:

Under the storm (0.9.2) directory, a test jar package (APACHE-storm-0.9.2-incubating/examples/storm-starter) can be used for Cluster Environment verification.

Task Scheduling Method:

# Localcluster storm jar storm-starter-topologies-0.9.2-incubating.jar storm. starter. wordcounttopology # cluster mode storm jar storm-starter-topologies-0.9.2-incubating.jar storm. starter. wordcounttopology ARGs

Localcluster is a standalone mode. in the vernacular, The result test and verification can be performed without relying on the cluster. This method is useful in the development phase. You only need to introduce the jar dependency of storm into the project and perform local tests using the standalone mode, and then put it in the cluster.

Sample code snippet (captured from wordcounttopology ):

public static void main(String[] args) throws Exception {    TopologyBuilder builder = new TopologyBuilder();    builder.setSpout("spout", new RandomSentenceSpout(), 5);    builder.setBolt("split", new SplitSentence(), 8).shuffleGrouping("spout");    builder.setBolt("count", new WordCount(), 12).fieldsGrouping("split", new Fields("word"));    Config conf = new Config();    conf.setDebug(true);    if (args != null && args.length > 0) {      conf.setNumWorkers(3);      StormSubmitter.submitTopologyWithProgressBar("wordCount", conf, builder.createTopology());    }    else {      conf.setMaxTaskParallelism(3);      LocalCluster cluster = new LocalCluster();      cluster.submitTopology("word-count", conf, builder.createTopology());      Thread.sleep(10000);      cluster.shutdown();    }}

The source data of spout is pre-developed using LinkedIn Kafka. The spout obtains the corresponding production information based on the topic and consumes it in the storm cluster.

II. Introduction to Kafka

There is almost no installation process. Unzip the package and use it directly.

Usage:

Start Kafka. /kafka-server-start.sh .. /config/server. properties to create a topic. /kafka-topics.sh -- Topic kafkatoptic -- create -- zookeeper 127.0.0.1: 2181 -- replication-factor 1 -- partition 1 view consumer. /kafka-console-consumer.sh -- zookeeper 127.0.0.1: 2181 -- Topic kafkatoptic -- from-beginning view topic. /kafka-topics.sh -- list -- zookeeper localhost: 2181 produces messages. /kafka-console-producer.sh -- broker-list 127.0.0.1: 9092 -- Topic kafkatoptic

Kafka and storm integration open source projects also have many, do not need secondary development, such as storm-kafka-0.8-plus.

Kafka producers also use flume for data production in the big data framework.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Kafka + storm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Kafka + storm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support