Kafka + storm

Source: Internet
Author: User

Due to project requirements, storm has been pre-developed recently. There are many installation and usage examples on the Internet. Please record them here and forget it.

I. Introduction to storm

Storm terms include stream, spout, Bolt, task, worker, stream grouping, and topology. Stream is the data to be processed. Sprout is the data source. Bolts process data. A task is a thread running in spout or bolt. Worker is the process that runs these threads. Stream grouping specifies what bolts receive as input data. Data can be randomly allocated (the term is shuffle), allocated based on field values (the term is fields), broadcast (the term is all), or always sent to a task (the term is global ), you can also ignore the data (the term is none), or decide by the custom logic (the term is direct ). Topology is the spout and bolt node networks connected by stream grouping. These terms are described in more detail on the storm concepts page.

To run a storm cluster, you need Apache zookeeper, ø MQ, jzmq, Java 6, and Python 2.6.6. Zookeeper is used to manage different components in the cluster. MQ is an internal message system and jzmq is the Java of MQ.

Installation Details: http://blog.csdn.net/qiyating0808/article/details/36041299

Start a storm cluster:

storm nimbus >/dev/null 2>&1 &storm supervisor >/dev/null 2>&1 &storm ui >/dev/null 2>&1 &
Topology task scheduling:

Under the storm (0.9.2) directory, a test jar package (APACHE-storm-0.9.2-incubating/examples/storm-starter) can be used for Cluster Environment verification.

Task Scheduling Method:

# Localcluster storm jar storm-starter-topologies-0.9.2-incubating.jar storm. starter. wordcounttopology # cluster mode storm jar storm-starter-topologies-0.9.2-incubating.jar storm. starter. wordcounttopology ARGs

Localcluster is a standalone mode. in the vernacular, The result test and verification can be performed without relying on the cluster. This method is useful in the development phase. You only need to introduce the jar dependency of storm into the project and perform local tests using the standalone mode, and then put it in the cluster.

Sample code snippet (captured from wordcounttopology ):


public static void main(String[] args) throws Exception {    TopologyBuilder builder = new TopologyBuilder();    builder.setSpout("spout", new RandomSentenceSpout(), 5);    builder.setBolt("split", new SplitSentence(), 8).shuffleGrouping("spout");    builder.setBolt("count", new WordCount(), 12).fieldsGrouping("split", new Fields("word"));    Config conf = new Config();    conf.setDebug(true);    if (args != null && args.length > 0) {      conf.setNumWorkers(3);      StormSubmitter.submitTopologyWithProgressBar("wordCount", conf, builder.createTopology());    }    else {      conf.setMaxTaskParallelism(3);      LocalCluster cluster = new LocalCluster();      cluster.submitTopology("word-count", conf, builder.createTopology());      Thread.sleep(10000);      cluster.shutdown();    }}
The source data of spout is pre-developed using LinkedIn Kafka. The spout obtains the corresponding production information based on the topic and consumes it in the storm cluster.


II. Introduction to Kafka

There is almost no installation process. Unzip the package and use it directly.

Usage:


Start Kafka. /kafka-server-start.sh .. /config/server. properties to create a topic. /kafka-topics.sh -- Topic kafkatoptic -- create -- zookeeper 127.0.0.1: 2181 -- replication-factor 1 -- partition 1 view consumer. /kafka-console-consumer.sh -- zookeeper 127.0.0.1: 2181 -- Topic kafkatoptic -- from-beginning view topic. /kafka-topics.sh -- list -- zookeeper localhost: 2181 produces messages. /kafka-console-producer.sh -- broker-list 127.0.0.1: 9092 -- Topic kafkatoptic
Kafka and storm integration open source projects also have many, do not need secondary development, such as storm-kafka-0.8-plus.


Kafka producers also use flume for data production in the big data framework.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.