Kafka+storm+hbase<Three integrations encounter pits and solutions>

Kafka+storm+hbase

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This blog is based on the following software:


Centos 7.3 (1611)
kafka_2.10-0.10.2.1.tgz
zookeeper-3.4.10.tar.gz
hbase-1.3.1-bin.tar.gz
apache-storm-1.1.0.tar.gz
hadoop-2.8.0.tar.gz
jdk-8u131-linux-x64.tar.gz
IntelliJ idea 2017.1.3 x64

IP	role
172.17.11.85	Namenode, Secondarynamenode, Datanode, Hmaster, Hregionserver
172.17.11.86	DataNode, Hregionserver
172.17.11.87	DataNode, Hregionserver

1. First of all, the idea of a kafka–>storm

I use a producer to give a fixed topic under production data

public class Producer {private Final kafkaproducer<string, string> Producer;

    Private final String topic;
        Public Producer (String topic) {Properties props = new Properties ();
        Props.put ("Bootstrap.servers", "172.17.11.85:9092,172.17.11.86:9092,172.17.11.87:9092");
        Props.put ("Client.id", "demoproducer");
        Props.put ("Batch.size", 16384);//16m props.put ("linger.ms", 1000); Props.put ("Buffer.memory", 33554432);//32m props.put ("Key.serializer", "org.apache.kafka.common.serialization.Str
        Ingserializer ");

        Props.put ("Value.serializer", "Org.apache.kafka.common.serialization.StringSerializer");
        Producer = new kafkaproducer<> (props);
    this.topic = topic; } public void Producermsg () throws interruptedexception {String data = "Apache Storm was a free and open sour Ce distributed Realtime Computation system Storm makes it easy-reliably process unbounded streams of data doing for Realtime processing what Hadoop does for batch processing. Storm is simple, can being used with any programming language, and are a lot of fun to use!\n "+" Storm have MA NY use Cases:realtime analytics, online machine learning, continuous computation, distributed RPCs, ETL, and more. Storm is FAST:A benchmark clocked it in over a million tuples processed per second per node.
                It is scalable, fault-tolerant, guarantees your data would be processed, and are easy-to-set up and operate.\n "+ "Storm integrates with the queueing and database technologies your already use. A Storm topology consumes streams of data and processes those streams in arbitrarily complex ways, repartitioning the Stre AMS between each stage of the computation however needed.
        Read more in the tutorial. ";
        data = Data.replaceall ("[\\pP '" ""] "," ");
        string[] Words = Data.split ("");

        Random _rand = new Random ();
        Random rnd = new Random ();
    int events = 10;    for (long nevents = 0; nevents < events; nevents++) {Long runtime = new Date (). GetTime ();
            int lastipnum = Rnd.nextint (255);
            String IP = "192.168.2." + lastipnum;
            String msg = Words[_rand.nextint (words.length)];
                try {producer.send (New producerrecord<> (topic, IP, msg));
            SYSTEM.OUT.PRINTLN ("Sent message: (" + IP + "," + msg + ")");
            } catch (Exception e) {e.printstacktrace ();
    }} thread.sleep (10000); } public static void Main (string[] args) throws Interruptedexception {Producer Producer = new Producer (consta Nts.
        TOPIC);
    Producer.producermsg ();

 }
}

The producer will split the punctuation in two sentences and then divide it into a single word, then produce it under the subject of execution.
This should be no problem, the next is the consumer, but also the storm's spout kafkaspout:


kafkaspoutconfig<string, string> kafkaspoutconfig = Kafkaspoutconfig
                    . Builder (Args[0], args[1])
                    . SetProp (Consumerconfig.enable_auto_commit_config, "true")
                    . SetProp (Consumerconfig.auto_commit_interval_ms_ CONFIG,
                    SetProp (Consumerconfig.session_timeout_ms_config, 30000).
                    Setoffsetcommitperiodms (10000)
                    . Setgroupid (args[2])
                    . Setmaxuncommittedoffsets (+).
                    Setfirstpolloffsetstrategy ( KafkaSpoutConfig.FirstPollOffsetStrategy.LATEST)
                    . Build ();



kafkaspout<string, string> kafkaspout = new kafkaspout<> (kafkaspoutconfig);

Consumer (Spout)-Specified subject consumption data is then emitted to the next bolt


public class Wordcountbolt extends Basebasicbolt {
    private map<string, integer> counts = new hashmap<> (); Public

    void Execute (Tuple input, basicoutputcollector collector) {
        String level = Input.getstringbyfield (" Value ");
        Integer count = Counts.get (level);
        if (count = = null)
            count = 0;
        count++;
        Counts.put (level, count);
        System.out.println ("Wordcountbolt Receive:" +level+ ""   +count);
        Collector.emit (level, count.tostring ()));
    }

    public void Declareoutputfields (Outputfieldsdeclarer declarer) {
        declarer.declare (the new fields ("word", "count"));
    }
}

2.storm->hbase

The first thing to do is to copy the Hbase-site.xml configuration file from the cluster

The next step is the API call:


Simplehbasemapper Mapper = new Simplehbasemapper ()
                    . Withrowkeyfield ("word")
                    . Withcolumnfields (New Fields (" Count "))
                    . withcolumnfamily (" result ");
            Hbasebolt Hbasebolt = new Hbasebolt (args[3], Mapper)
                    . Withconfigkey ("HBase");

3. Construction of the entire topology


Builder.setspout ("Kafkaspout", kafkaspout, 1);            Builder.setbolt ("Wordsplitbolt", New Wordsplitbolt (), 2)
//                    . Shufflegrouping ("Kafkaspout");
            Builder.setbolt ("Countbolt", New Wordcountbolt (), 2)
                    . fieldsgrouping ("Kafkaspout", New Fields ("value"));
            Builder.setbolt ("Hbasebolt", Hbasebolt, 1)
                    . Addconfiguration ("HBase", New Hashmap<string, object> ())
                    . Shufflegrouping ("Countbolt");

The next thing is the real point, the point! Focus! Focus!

Key 1:-pom version information for a file

 <dependencies> <dependency> <groupId>org.apache.storm</groupId> <artifactId>storm-core</artifactId> <version>1.1.0</version> 
<!--< scope>provided</scope>--> </dependency> <dependency> <groupid>or G.apache.storm</groupid> <artifactId>storm-hbase</artifactId> 
<version>1.1 .0</version> </dependency> <dependency> <groupid>org.apache.storm</ Groupid> <artifactId>storm-kafka-client</artifactId> 
<version>1.1.0</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId><artifactId>hadoop-client</artifactId> 
<version>2.7.3</version><exclusions><exclusion> <groupId>org.slf4j</groupId> <artif actid>slf4j-log4j12</artifactid> </exclusion> 
</exclusions> </ Dependency> </dependencies>

I am importing hadoop-client 2.7.3, as for why. If I write 2.8.0, that will produce the following exception

Java.lang.NoSuchMethodError:org.apache.hadoop.security.authentication.util.KerberosUtil.hasKerberosTicket ( Ljavax/security/auth/subject;) Z at Org.apache.hadoop.security.usergroupinformation.<init> (
    usergroupinformation.java:652) ~[hadoop-common-2.8.0.jar:?] At Org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject (usergroupinformation.java:843) ~[
    Hadoop-common-2.8.0.jar:?] At Org.apache.hadoop.security.UserGroupInformation.getLoginUser (usergroupinformation.java:802) ~[
    Hadoop-common-2.8.0.jar:?] At Org.apache.hadoop.security.UserGroupInformation.getCurrentUser (usergroupinformation.java:675) ~[
    Hadoop-common-2.8.0.jar:?] At Org.apache.hadoop.hbase.security.user$securehadoopuser.<init> (user.java:285) ~[hbase-common-1.1.0.jar : 1.1.0] at org.apache.hadoop.hbase.security.user$securehadoopuser.<init> (user.java:281) ~[ Hbase-common-1.1.0.jar:1.1.0] at Org.apache.hadoop.hbase.security.User.getCurrent (user.java:185) ~[ Hbase-common-1.1.0.jar: 1.1.0] at Org.apache.hadoop.hbase.security.UserProvider.getCurrent (userprovider.java:88) ~[ Hbase-common-1.1.0.jar:1.1.0] at org.apache.storm.hbase.common.hbaseclient.<init> (HBaseClient.java:43) ~[ Storm-hbase-1.1.0.jar:1.1.0] at Org.apache.storm.hbase.bolt.AbstractHBaseBolt.prepare (abstracthbasebolt.java:75) ~[storm-hbase-1.1.0.jar:1.1.0] at Org.apache.storm.hbase.bolt.HBaseBolt.prepare (hbasebolt.java:109) ~[ Storm-hbase-1.1.0.jar:1.1.0] at Org.apache.storm.daemon.executor$fn__5044$fn__5057.invoke (executor.clj:791) ~[ Storm-core-1.1.0.jar:1.1.0] at Org.apache.storm.util$async_loop$fn__557.invoke (util.clj:482) [
    Storm-core-1.1.0.jar:1.1.0] at Clojure.lang.AFn.run (afn.java:22) [Clojure-1.7.0.jar:?] At Java.lang.Thread.run (thread.java:745) [?: 1.8.0_121] 72750 [thread-22-hbasebolt-executor[1 1]] ERROR O.a.s.d.executor-java.lang.nosuchmethoderror: Org.apache.hadoop.security.authentication.util.KerberosUtil.hasKerberosTicket (Ljavax/security/auth/subject;) Z at org.apache.hadoop.security.usergroupinformation.<init> (usergroupinformation.java:652) ~[
    Hadoop-common-2.8.0.jar:?] At Org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject (usergroupinformation.java:843) ~[
    Hadoop-common-2.8.0.jar:?] At Org.apache.hadoop.security.UserGroupInformation.getLoginUser (usergroupinformation.java:802) ~[
    Hadoop-common-2.8.0.jar:?] At Org.apache.hadoop.security.UserGroupInformation.getCurrentUser (usergroupinformation.java:675) ~[
    Hadoop-common-2.8.0.jar:?] At Org.apache.hadoop.hbase.security.user$securehadoopuser.<init> (user.java:285) ~[hbase-common-1.1.0.jar : 1.1.0] at org.apache.hadoop.hbase.security.user$securehadoopuser.<init> (user.java:281) ~[ Hbase-common-1.1.0.jar:1.1.0] at Org.apache.hadoop.hbase.security.User.getCurrent (user.java:185) ~[ Hbase-common-1.1.0.jar:1.1.0] at org.apache.hadoop.hbase.security.UserProvider.getCurrent (userprovider.java:88) ~ [hbase-common-1.1.0.jar:1.1.0] at org.apache.storm.hbase.common.hbaseclient.<init> (hbaseclient.java:43) ~[storm-hbase-1.1.0.jar:1.1.0] at
    Org.apache.storm.hbase.bolt.AbstractHBaseBolt.prepare (abstracthbasebolt.java:75) ~[storm-hbase-1.1.0.jar:1.1.0] At Org.apache.storm.hbase.bolt.HBaseBolt.prepare (hbasebolt.java:109) ~[storm-hbase-1.1

Should be caused by version incompatibility

Key 2: log4j-over-slf4j.jar AND slf4j-log4j12.jar conflict

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/C:/Users/geekp/.m2/repository/org/apache/logging/log4j/log4j-slf4j-impl/2.8/log4j-slf4j-impl-2.8.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/C:/Users/geekp/.m2/repository/org/slf4j/slf4j-log4j12/1.7.10/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]


....

SLF4J: Detected both log4j-over-slf4j.jar AND slf4j-log4j12.jar on the class path, preempting StackOverflowError. 
SLF4J: See also http://www.slf4j.org/codes.html#log4jDelegationLoop for more details.

[Thread-22-HbaseBolt-executor[1 1]] ERROR o.a.s.util - Async loop died!
java.lang.NoSuchMethodError: org.apache.hadoop.security.authentication.util.KerberosUtil.hasKerberosTicket(Ljavax/security/auth/Subject;)Z
    at org.apache.hadoop.security.UserGroupInformation.<init>(UserGroupInformation.java:652) ~[hadoop-common-2.8.0.jar:?]
    at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:843) ~[hadoop-common-2.8.0.jar:?]
    at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:802) ~[hadoop-common-2.8.0.jar:?]
    at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:675) ~[hadoop-common-2.8.0.jar:?]
    at org.apache.hadoop.hbase.security.User$SecureHadoopUser.<init>(User.java:285) ~[hbase-common-1.1.0.jar:1.1.0]
    at org.apache.hadoop.hbase.security.User$SecureHadoopUser.<init>(User.java:281) ~[hbase-common-1.1.0.jar:1.1.0]
    at org.apache.hadoop.hbase.security.User.getCurrent(User.java:185) ~[hbase-common-1.1.0.jar:1.1.0]
    at org.apache.hadoop.hbase.security.UserProvider.getCurrent(UserProvider.java:88) ~[hbase-common-1.1.0.jar:1.1.0]
    at org.apache.storm.hbase.common.HBaseClient.<init>(HBaseClient.java:43) ~[storm-hbase-1.1.0.jar:1.1.0]
    at org.apache.storm.hbase.bolt.AbstractHBaseBolt.prepare(AbstractHBaseBolt.java:75) ~[storm-hbase-1.1.0.jar:1.1.0]
    at org.apache.storm.hbase.bolt.HBaseBolt.prepare(HBaseBolt.java:109) ~[storm-hbase-1.1.0.jar:1.1.0]
    at org.apache.storm.daemon.executor$fn__5044$fn__5057.invoke(executor.clj:791) ~[storm-core-1.1.0.jar:1.1.0]
    at org.apache.storm.util$async_loop$fn__557.invoke(util.clj:482) [storm-core-1.1.0.jar:1.1.0]
    at clojure.lang.AFn.run(AFn.java:22) [clojure-1.7.0.jar:?]
    at java.lang.Thread.run(Thread.java:745) [?:1.8.0_121]
71976 [Thread-22-HbaseBolt-executor[1 1]] ERROR o.a.s.d.executor - 
java.lang.NoSuchMethodError: org.apache.hadoop.security.authentication.util.KerberosUtil.hasKerberosTicket(Ljavax/security/auth/Subject;)Z
    at org.apache.hadoop.security.UserGroupInformation.<init>(UserGroupInformation.java:652) ~[hadoop-common-2.8.0.jar:?]
    at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:843) ~[hadoop-common-2.8.0.jar:?]
    at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:802) ~[hadoop-common-2.8.0.jar:?]
    at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:675) ~[hadoop-common-2.8.0.jar:?]
    at org.apache.hadoop.hbase.security.User$SecureHadoopUser.<init>(User.java:285) ~[hbase-common-1.1.0.jar:1.1.0]
    at org.apache.hadoop.hbase.security.User$SecureHadoopUser.<init>(User.java:281) ~[hbase-common-1.1.0.jar:1.1.0]
    at org.apache.hadoop.hbase.security.User.getCurrent(User.java:185) ~[hbase-common-1.1.0.jar:1.1.0]
    at org.apache.hadoop.hbase.security.UserProvider.getCurrent(UserProvider.java:88) ~[hbase-common-1.1.0.jar:1.1.0]
    at org.apache.storm.hbase.common.HBaseClient.<init>(HBaseClient.java:43) ~[storm-hbase-1.1.0.jar:1.1.0]
    at org.apache.storm.hbase.bolt.AbstractHBaseBolt.prepare(AbstractHBaseBolt.java:75) ~[storm-hbase-1.1.0.jar:1.1.0]
    at org.apache.storm.hbase.bolt.HBaseBolt.prepare(HBaseBolt.java:109) ~[storm-hbase-1.1.0.jar:1.1.0]
    at org.apache.storm.daemon.executor$fn__5044$fn__5057.invoke(executor.clj:791) ~[storm-core-1.1.0.jar:1.1.0]
    at org.apache.storm.util$async_loop$fn__557.invoke(util.clj:482) [storm-core-1.1.0.jar:1.1.0]
    at clojure.lang.AFn.run(AFn.java:22) [clojure-1.7.0.jar:?]
    at java.lang.Thread.run(Thread.java:745) [?:1.8.0_121]
71976 [Thread-26-kafkaSpout-executor[4 4]] INFO  o.a.s.k.s.KafkaSpout - Initialization complete
71992 [Thread-22-HbaseBolt-executor[1 1]] ERROR o.a.s.util - Halting process: ("Worker died")
java.lang.RuntimeException: ("Worker died")
    at org.apache.storm.util$exit_process_BANG_.doInvoke(util.clj:341) [storm-core-1.1.0.jar:1.1.0]
    at clojure.lang.RestFn.invoke(RestFn.java:423) [clojure-1.7.0.jar:?]
    at org.apache.storm.daemon.worker$fn__5642$fn__5643.invoke(worker.clj:759) [storm-core-1.1.0.jar:1.1.0]
    at org.apache.storm.daemon.executor$mk_executor_data$fn__4863$fn__4864.invoke(executor.clj:274) [storm-core-1.1.0.jar:1.1.0]
    at org.apache.storm.util$async_loop$fn__557.invoke(util.clj:494) [storm-core-1.1.0.jar:1.1.0]
    at clojure.lang.AFn.run(AFn.java:22) [clojure-1.7.0.jar:?]
    at java.lang.Thread.run(Thread.java:745) [?:1.8.0_121]

Process finished with exit code 1
1

Just do not introduce slf4j-log4j12, modify the pom file:

 <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>2.7.3</version>
            <exclusions>
                <exclusion>
                    <groupId>org.slf4j</groupId>
                    <artifactId>slf4j-log4j12</artifactId>
                </exclusion>
            </exclusions>
        </dependency>

Key 3: Hbase configuration file hbase-site.xml on the server
This question is really especially important, md bothered me for a day.

My configuration file on the server cluster is like this

<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://master:9000/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>master,slave1,slave2</value>
</property>
<property>
<name>hbase.master.info.bindAddress</name>
<value>0.0.0.0</value>
</property>
<property>
<name>hbase.master.info.port</name>
<value>16010</value>
</property>
<property>
<name>hbase.master.port</name>
<value>16000</value>
</property>

</configuration>

When I downloaded the configuration file, I changed the corresponding host name to an IP address. However, I couldn't get out of it. After a lot of troubleshooting, I found the final reason.
Very important: Hbase does not fully parse my local file, it will get the master's IP address from the configuration file, and then get the IP address, then go to the master node in the cluster to find the IP address of the slave node, but because I am In the cluster, the slave node in the conf/zoo.cfg file in the Hbase installation path is written as slave1 and slave2. After the program gets the name, it goes to the window to find the corresponding IP address, but! ! ! ! The point is that the corresponding IP address is not written under my hosts file. Caused it can not find the IP address from the node, this is the mastermind of not writing to Hbase! ! ! ! ! ! ! !

The solution is to add it in C:\Windows\System32\drivers\etc\hosts

Here to give yourself a note, in the future configuration of the cluster configuration file depends on the ability to write an IP address must write an IP address! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Kafka+storm+hbase

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Kafka+storm+hbase

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support