Due to project requirements, storm has been pre-developed recently. There are many installation and usage examples on the Internet. Please record them here and forget it.
I. Introduction to storm
Storm terms include stream, spout, Bolt, task, worker, stream grouping, and topology. Stream is the data to be processed. Sprout is the data source. Bolts process data. A task is a thread running in spout or bolt. Worker is the process that runs these threads. Stream grouping specifies what bolts receive as input data. Data can be randomly allocated (the term is shuffle), allocated based on field values (the term is fields), broadcast (the term is all), or always sent to a task (the term is global ), you can also ignore the data (the term is none), or decide by the custom logic (the term is direct ). Topology is the spout and bolt node networks connected by stream grouping. These terms are described in more detail on the storm concepts page.
To run a storm cluster, you need Apache zookeeper, ø MQ, jzmq, Java 6, and Python 2.6.6. Zookeeper is used to manage different components in the cluster. MQ is an internal message system and jzmq is the Java of MQ.
Installation Details: http://blog.csdn.net/qiyating0808/article/details/36041299
Start a storm cluster:
storm nimbus >/dev/null 2>&1 &storm supervisor >/dev/null 2>&1 &storm ui >/dev/null 2>&1 &
Topology task scheduling:
Under the storm (0.9.2) directory, a test jar package (APACHE-storm-0.9.2-incubating/examples/storm-starter) can be used for Cluster Environment verification.
Task Scheduling Method:
# Localcluster storm jar storm-starter-topologies-0.9.2-incubating.jar storm. starter. wordcounttopology # cluster mode storm jar storm-starter-topologies-0.9.2-incubating.jar storm. starter. wordcounttopology ARGs
Localcluster is a standalone mode. in the vernacular, The result test and verification can be performed without relying on the cluster. This method is useful in the development phase. You only need to introduce the jar dependency of storm into the project and perform local tests using the standalone mode, and then put it in the cluster.
Sample code snippet (captured from wordcounttopology ):
public static void main(String[] args) throws Exception { TopologyBuilder builder = new TopologyBuilder(); builder.setSpout("spout", new RandomSentenceSpout(), 5); builder.setBolt("split", new SplitSentence(), 8).shuffleGrouping("spout"); builder.setBolt("count", new WordCount(), 12).fieldsGrouping("split", new Fields("word")); Config conf = new Config(); conf.setDebug(true); if (args != null && args.length > 0) { conf.setNumWorkers(3); StormSubmitter.submitTopologyWithProgressBar("wordCount", conf, builder.createTopology()); } else { conf.setMaxTaskParallelism(3); LocalCluster cluster = new LocalCluster(); cluster.submitTopology("word-count", conf, builder.createTopology()); Thread.sleep(10000); cluster.shutdown(); }}
The source data of spout is pre-developed using LinkedIn Kafka. The spout obtains the corresponding production information based on the topic and consumes it in the storm cluster.
II. Introduction to Kafka
There is almost no installation process. Unzip the package and use it directly.
Usage:
Start Kafka. /kafka-server-start.sh .. /config/server. properties to create a topic. /kafka-topics.sh -- Topic kafkatoptic -- create -- zookeeper 127.0.0.1: 2181 -- replication-factor 1 -- partition 1 view consumer. /kafka-console-consumer.sh -- zookeeper 127.0.0.1: 2181 -- Topic kafkatoptic -- from-beginning view topic. /kafka-topics.sh -- list -- zookeeper localhost: 2181 produces messages. /kafka-console-producer.sh -- broker-list 127.0.0.1: 9092 -- Topic kafkatoptic
Kafka and storm integration open source projects also have many, do not need secondary development, such as storm-kafka-0.8-plus.
Kafka producers also use flume for data production in the big data framework.