Kafka-Storm integrated deployment

Source: Internet
Author: User

Kafka-Storm integrated deployment
Preface

The main component of Distributed Real-time computing is Apache Storm Based on stream computing. The data source of real-time computing comes from Kafka in the basic data input component, how to pass the message data of Kafka to Storm is discussed in this article.

0. Prepare materials
  • Normal and stable Kafka cluster (Version: Kafka 0.8.2)
  • Normal and stable Storm cluster (Version: Storm 0.9.8)
  • Maven 3.x
1. Storm Topology Project

Storm jobs are called Topology. To process real-time computing tasks, you need to create a Storm Topology project. Due to the message transmission mode of Kafka, the so-called Kafka-Storm integration deployment actually requires a Spout interface to receive Kafka messages. Fortunately, reliable KafkaSpout has been built into the latest Storm official version. You do not need to write it manually. You only need to configure KafkaSpout as the input data source of Topology.

2. Maven Configuration

This project is built on Maven.

  • Main dependencies to be configured
        <dependency>            <groupId>org.apache.storm</groupId>            <artifactId>storm-kafka</artifactId>            <version>0.9.3</version>            <scope>provided</scope>        </dependency>        <dependency>            <groupId>org.apache.storm</groupId>            <artifactId>storm-core</artifactId>            <version>0.9.3</version>            <scope>provided</scope>        </dependency>        <dependency>            <groupId>org.apache.kafka</groupId>            <artifactId>kafka_2.10</artifactId>            <version>0.8.2.1</version>            <scope>provided</scope>        </dependency>

Note: The dependent scope here is "provided"

  • Maven compilation Configuration
    <build>        <finalName>storm-kafka-topology</finalName>        <resources>            <resource>                <directory>src/main/resources</directory>            </resource>        </resources>        <plugins>            <plugin>                <groupId>org.apache.maven.plugins</groupId>                <artifactId>maven-compiler-plugin</artifactId>                <version>3.1</version>                <configuration>                    <source>1.7</source>                    <target>1.7</target>                </configuration>            </plugin>            <plugin>                <groupId>org.apache.maven.plugins</groupId>                <artifactId>maven-shade-plugin</artifactId>                <executions>                    <execution>                        <phase>package</phase>                        <goals>                            <goal>shade</goal>                        </goals>                    </execution>                </executions>                <configuration>                    <finalName>${project.artifactId}-${project.version}-shade</finalName>                    <filters>                        <filter>                            <artifact>*:*</artifact>                            <excludes>                                <exclude>META-INF/*.SF</exclude>                                <exclude>META-INF/*.DSA</exclude>                                <exclude>META-INF/*.RSA</exclude>                            </excludes>                        </filter>                    </filters>                    <artifactSet>                        <excludes>                            <exclude>log4j:log4j:jar:</exclude>                        </excludes>                    </artifactSet>                    <transformers>                        <transformer                            implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer" />                        <transformer                            implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">                            <mainClass>storm.kafka.example.StormTopology</mainClass>                        </transformer>                    </transformers>                </configuration>            </plugin>        </plugins>    </build>
3. Implement Topology

The following is a simple example of Topology (Java version ).

 1 2 3 4 5 6 7 8 910111213141516171819202122232425262728293031323334353637383940
Public class StormTopology {// Topology close command (message control passed through external) public static boolean shutdown = false; public static void main (String [] args) {// register ZooKeeper host BrokerHosts brokerHosts = new ZkHosts ("hd182: 2181, hd185: 2181, hd128: 2181 "); // The Name Of The received Kafka topic String topic = "flumeTopic"; // The registered node name of ZooKeeper (Note: Add "/"; otherwise, ZooKeeper will not be recognized) string zkRoot = "/kafkastorm"; // configure Spout String spoutId = "MyKafka"; SpoutConfig spoutConfig = new SpoutConfig (brokerHosts, topic, zkRoot, spoutId); // configure Scheme (optional) spoutConfig. scheme = new SchemeAsMultiScheme (new SimpleMessageScheme (); KafkaSpout kafkaSpout = new KafkaSpout (spoutConfig); TopologyBuilder builder = new TopologyBuilder (); builder. setSpout ("kafka-spout", kafkaSpout); builder. setBolt ("operator", new OperatorBolt ()). shuffleGrouping ("kafka -Spout "); Config conf = new Config (); conf. setDebug (true); conf. setNumWorkers (3); // The test environment adopts the local mode LocalCluster cluster = new LocalCluster (); cluster. submitTopology ("test", conf, builder. createTopology (); while (! Shutdown) {Utils. sleep (100);} cluster. killTopology ("test"); cluster. shutdown ();}}

Because a KafkaSpout can only receive message data of a specified topic, you must configure the number of spouts according to business requirements in the Topology Implementation of the actual production environment.

4. Necessary dependent packages

Because the Topology project dependencies are "provided" scope, You need to copy the dependent jar packages involved to the lib folder in the Storm installation directory, including:

 
 
  • kafka_2.10-0.8.2.1.jar
  • storm-kafka-0.9.3.jar
  • scala-library-2.10.4.jar
  • zookeeper-3.4.6.jar
  • curator-client-2.6.0.jar
  • curator-framework-2.6.0.jar
  • curator-recipes-2.6.0.jar
  • guava-16.0.1.jar
  • metrics-core-2.2.0.jar
5. Launch and run

Submit tasks to the Storm cluster and observe the data output results. In addition, you can view the running status of the internal components of Topology on the Storm UI (Cluster mode is required ).

Kafka architecture design of the distributed publish/subscribe message system

Apache Kafka code example

Apache Kafka tutorial notes

Principles and features of Apache kafka (0.8 V)

Kafka deployment and code instance

Introduction to Kafka and establishment of Cluster Environment

For details about Kafka, click here
Kafka: click here

This article permanently updates the link address:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.