Kafka-Storm integrated deployment
Preface
The main component of Distributed Real-time computing is Apache Storm Based on stream computing. The data source of real-time computing comes from Kafka in the basic data input component, how to pass the message data of Kafka to Storm is discussed in this article.
0. Prepare materials
- Normal and stable Kafka cluster (Version: Kafka 0.8.2)
- Normal and stable Storm cluster (Version: Storm 0.9.8)
- Maven 3.x
1. Storm Topology Project
Storm jobs are called Topology. To process real-time computing tasks, you need to create a Storm Topology project. Due to the message transmission mode of Kafka, the so-called Kafka-Storm integration deployment actually requires a Spout interface to receive Kafka messages. Fortunately, reliable KafkaSpout has been built into the latest Storm official version. You do not need to write it manually. You only need to configure KafkaSpout as the input data source of Topology.
2. Maven Configuration
This project is built on Maven.
- Main dependencies to be configured
<dependency> <groupId>org.apache.storm</groupId> <artifactId>storm-kafka</artifactId> <version>0.9.3</version> <scope>provided</scope> </dependency> <dependency> <groupId>org.apache.storm</groupId> <artifactId>storm-core</artifactId> <version>0.9.3</version> <scope>provided</scope> </dependency> <dependency> <groupId>org.apache.kafka</groupId> <artifactId>kafka_2.10</artifactId> <version>0.8.2.1</version> <scope>provided</scope> </dependency>
Note: The dependent scope here is "provided"
- Maven compilation Configuration
<build> <finalName>storm-kafka-topology</finalName> <resources> <resource> <directory>src/main/resources</directory> </resource> </resources> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> <version>3.1</version> <configuration> <source>1.7</source> <target>1.7</target> </configuration> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-shade-plugin</artifactId> <executions> <execution> <phase>package</phase> <goals> <goal>shade</goal> </goals> </execution> </executions> <configuration> <finalName>${project.artifactId}-${project.version}-shade</finalName> <filters> <filter> <artifact>*:*</artifact> <excludes> <exclude>META-INF/*.SF</exclude> <exclude>META-INF/*.DSA</exclude> <exclude>META-INF/*.RSA</exclude> </excludes> </filter> </filters> <artifactSet> <excludes> <exclude>log4j:log4j:jar:</exclude> </excludes> </artifactSet> <transformers> <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer" /> <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer"> <mainClass>storm.kafka.example.StormTopology</mainClass> </transformer> </transformers> </configuration> </plugin> </plugins> </build>
3. Implement Topology
The following is a simple example of Topology (Java version ).
1 2 3 4 5 6 7 8 910111213141516171819202122232425262728293031323334353637383940 |
Public class StormTopology {// Topology close command (message control passed through external) public static boolean shutdown = false; public static void main (String [] args) {// register ZooKeeper host BrokerHosts brokerHosts = new ZkHosts ("hd182: 2181, hd185: 2181, hd128: 2181 "); // The Name Of The received Kafka topic String topic = "flumeTopic"; // The registered node name of ZooKeeper (Note: Add "/"; otherwise, ZooKeeper will not be recognized) string zkRoot = "/kafkastorm"; // configure Spout String spoutId = "MyKafka"; SpoutConfig spoutConfig = new SpoutConfig (brokerHosts, topic, zkRoot, spoutId); // configure Scheme (optional) spoutConfig. scheme = new SchemeAsMultiScheme (new SimpleMessageScheme (); KafkaSpout kafkaSpout = new KafkaSpout (spoutConfig); TopologyBuilder builder = new TopologyBuilder (); builder. setSpout ("kafka-spout", kafkaSpout); builder. setBolt ("operator", new OperatorBolt ()). shuffleGrouping ("kafka -Spout "); Config conf = new Config (); conf. setDebug (true); conf. setNumWorkers (3); // The test environment adopts the local mode LocalCluster cluster = new LocalCluster (); cluster. submitTopology ("test", conf, builder. createTopology (); while (! Shutdown) {Utils. sleep (100);} cluster. killTopology ("test"); cluster. shutdown ();}} |
Because a KafkaSpout can only receive message data of a specified topic, you must configure the number of spouts according to business requirements in the Topology Implementation of the actual production environment.
4. Necessary dependent packages
Because the Topology project dependencies are "provided" scope, You need to copy the dependent jar packages involved to the lib folder in the Storm installation directory, including:
- kafka_2.10-0.8.2.1.jar
- storm-kafka-0.9.3.jar
- scala-library-2.10.4.jar
- zookeeper-3.4.6.jar
- curator-client-2.6.0.jar
- curator-framework-2.6.0.jar
- curator-recipes-2.6.0.jar
- guava-16.0.1.jar
- metrics-core-2.2.0.jar
5. Launch and run
Submit tasks to the Storm cluster and observe the data output results. In addition, you can view the running status of the internal components of Topology on the Storm UI (Cluster mode is required ).
Kafka architecture design of the distributed publish/subscribe message system
Apache Kafka code example
Apache Kafka tutorial notes
Principles and features of Apache kafka (0.8 V)
Kafka deployment and code instance
Introduction to Kafka and establishment of Cluster Environment
For details about Kafka, click here
Kafka: click here
This article permanently updates the link address: