The cluster environment in which Hadoop is deployed is mentioned earlier because we need to use HDFS to store the storm data offline into the HDFs and then use Hadoop to extract data from the HDFS for analytical processing.
As a result, we need to integrate STORM-HDFS, encountered many problems in the integration process, and some problems can be found on the Internet, but the solution is not practical, so here to share out to learn for themselves, but also to meet the same problems in the confusion of partners to provide solutions.
First of all, the integration of Storm-hdfs, need to write a topology (topology), and then put the Strom up to run, where the source code, I refer to the http://shiyanjun.cn/archives/934.html
Then I packaged the deployment to storm, the deployment was successful, you can see the storm UI found an error, so the query from the machine log found the following errors:
2015-11-13t15:58:13.119+0800 B.s.util [ERROR] Async loop died! Java.lang.RuntimeException:Error preparing hdfsbolt:no filesystem for Scheme:hdfs at Org.apache.storm.hdfs.bolt . Abstracthdfsbolt.prepare (abstracthdfsbolt.java:109) ~[stormjar.jar:na] at backtype.storm.daemon.executor$fn__4722 $FN __4734.invoke (executor.clj:692) ~[storm-core-0.9.4.jar:0.9.4] at Backtype.storm.util$async_loop$fn__458.invoke (util.clj:461) ~[storm-core-0.9.4.jar:0.9.4] at Clojure.lang.AFn.run (afn.java:24) [Clojure-1.5.1.jar:na] A
T Java.lang.Thread.run (thread.java:745) [na:1.7.0_71] caused by:java.io.IOException:No filesystem for Scheme:hdfs At Org.apache.hadoop.fs.FileSystem.getFileSystemClass (filesystem.java:2421) ~[stormjar.jar:na] at Org.apache.ha Doop.fs.FileSystem.createFileSystem (filesystem.java:2428) ~[stormjar.jar:na] at ORG.APACHE.HADOOP.FS.FILESYSTEM.A ccess$200 (filesystem.java:88) ~[stormjar.jar:na] at Org.apache.hadoop.fs.FileSystem$cache.getinternal (filesystem.java:2467) ~[stormjar.jar:na] at Org.apache.hadoop.fs.filesystem$cache.get (Fi lesystem.java:2449) ~[stormjar.jar:na] at Org.apache.hadoop.fs.FileSystem.get (filesystem.java:367) ~[stormjar.jar: NA] at Org.apache.storm.hdfs.bolt.HdfsBolt.doPrepare (hdfsbolt.java:86) ~[stormjar.jar:na] at Org.apache.st Orm.hdfs.bolt.AbstractHdfsBolt.prepare (abstracthdfsbolt.java:105) ~[stormjar.jar:na] ... 4 common frames omitted 2015-11-13t15:58:13.120+0800 b.s.d.executor [ERROR] Java.lang.RuntimeException:Error Preparing Hdfsbolt:no filesystem for Scheme:hdfs at Org.apache.storm.hdfs.bolt.AbstractHdfsBolt.prepare (ABSTRACTHDFSBOLT.J ava:109) ~[stormjar.jar:na] at Backtype.storm.daemon.executor$fn__4722$fn__4734.invoke (executor.clj:692) ~[storm-c
ore-0.9.4.jar:0.9.4] at Backtype.storm.util$async_loop$fn__458.invoke (util.clj:461) ~[storm-core-0.9.4.jar:0.9.4] At Clojure.lang.AFn.run (afn.java:24) [ClojuRe-1.5.1.jar:na] at Java.lang.Thread.run (thread.java:745) [na:1.7.0_71] caused by:java.io.IOException:No TEM for SCHEME:HDFS at Org.apache.hadoop.fs.FileSystem.getFileSystemClass (filesystem.java:2421) ~[stormjar.jar:na ] at Org.apache.hadoop.fs.FileSystem.createFileSystem (filesystem.java:2428) ~[stormjar.jar:na] at Org.apac he.hadoop.fs.filesystem.access$200 (filesystem.java:88) ~[stormjar.jar:na] at Org.apache.hadoop.fs.filesystem$cach E.getinternal (filesystem.java:2467) ~[stormjar.jar:na] at Org.apache.hadoop.fs.filesystem$cache.get (FileSystem.ja
va:2449) ~[stormjar.jar:na] at Org.apache.hadoop.fs.FileSystem.get (filesystem.java:367) ~[stormjar.jar:na] At Org.apache.storm.hdfs.bolt.HdfsBolt.doPrepare (hdfsbolt.java:86) ~[stormjar.jar:na] at Org.apache.storm.hdfs.bo Lt. Abstracthdfsbolt.prepare (abstracthdfsbolt.java:105) ~[stormjar.jar:na] ... 4 common frames omitted 2015-11-13t15:58:13.194+0800 b.s.util[ERROR] Halting process: ("worker Died") Java.lang.RuntimeException: ("worker died") at Backtype.storm.util$exit_process_b Ang_.doinvoke (util.clj:325) [storm-core-0.9.4.jar:0.9.4] at Clojure.lang.RestFn.invoke (restfn.java:423) [clojure-1 .5.1.jar:na] at Backtype.storm.daemon.worker$fn__5102$fn__5103.invoke (worker.clj:495) [storm-core-0.9.4.jar:0.9.4 ] at Backtype.storm.daemon.executor$mk_executor_data$fn__4555$fn__4556.invoke (executor.clj:240) [storm-core-0.9.4 . jar:0.9.4] at Backtype.storm.util$async_loop$fn__458.invoke (util.clj:473) [storm-core-0.9.4.jar:0.9.4] at Clojure.lang.AFn.run (afn.java:24) [Clojure-1.5.1.jar:na]
Look at the meaning of the error, there is probably no HDFs file system, see this, at first I thought it was my hadoop cluster deployment problem, causing the HDFs file system startup problem, I also checked for half a day, I found that the Hadoop cluster deployment is not a problem, out of the cluster deployment problem , so I searched the internet for a long time, after finally found a solution: Https://github.com/ptgoetz/storm-hdfs
Here's what it says:
When we packaged the topology (topology), we used the Maven-assembly-plugin maven plugin because we needed to get all the dependent packages involved, but this package would overwrite the same file in Meta-inf. Then after the package run will be a problem, so the solution, using Maven-shade-plugin Packaging:
<plugin> <groupId>org.apache.maven.plugins</groupId> <artifactid>maven-shade-plugin</ artifactid> <version>1.4</version> <configuration> <createDependencyReducedPom>
true</createdependencyreducedpom> </configuration> <executions> <execution>
<phase>package</phase> <goals> <goal>shade</goal> </goals> <configuration> <transformers> <transfor
Mer implementation= "Org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/> <transformer implementation= "Org.apache.maven.plugins.shade.resource.Ma Nifestresourcetransformer "> <mainClass></mainClass> </transf
Ormer></transformers> </configuration> </execution> </executions> </plugin> ;
Then I add this piece of packaged code to the Pom.xml, and here's the path to the <mainClass></mainClass> center plus your main method.
After packing, this thought that the problem can be solved completely, but, the days are doomed not so smooth, this time the deployment of the time and reported errors:
Exception in thread ' main ' java.lang.ExceptionInInitializerError at JAVA.LANG.CLASS.FORNAME0 (Native method) At Java.lang.Class.forName (class.java:191) at Backtype.storm.config__init.__init5 (Unknown Source) at BAC Ktype.storm.config__init.<clinit> (Unknown Source) at JAVA.LANG.CLASS.FORNAME0 (Native method) at Java . lang. Class.forName (class.java:274) at Clojure.lang.RT.loadClassForName (rt.java:2098) at Clojure.lang.RT.load (RT
. java:430) at Clojure.lang.RT.load (rt.java:411) at Clojure.core$load$fn__5018.invoke (core.clj:5530) At Clojure.core$load.doinvoke (core.clj:5529) at Clojure.lang.RestFn.invoke (restfn.java:408) at clojure.c Ore$load_one.invoke (core.clj:5336) at Clojure.core$load_lib$fn__4967.invoke (core.clj:5375) at Clojure.core $load _lib.doinvoke (core.clj:5374) at Clojure.lang.RestFn.applyTo (restfn.java:142) at clojure.core$apply.in Voke (core.clj:619)
At Clojure.core$load_libs.doinvoke (core.clj:5417) at Clojure.lang.RestFn.applyTo (restfn.java:137) At Clojure.core$apply.invoke (core.clj:621) at Clojure.core$use.doinvoke (core.clj:5507) at Clojure.lang.Re Stfn.invoke (restfn.java:408) at Backtype.storm.command.config_value$loading__4910__auto__.invoke (config_ Value.clj:16 at Backtype.storm.command.config_value__init.load (Unknown Source) at Backtype.storm.command. Config_value__init.<clinit> (Unknown Source) at JAVA.LANG.CLASS.FORNAME0 (Native method) at Java.lang. Class.forName (class.java:274) at Clojure.lang.RT.loadClassForName (rt.java:2098) at Clojure.lang.RT.load (RT
. java:430) at Clojure.lang.RT.load (rt.java:411) at Clojure.core$load$fn__5018.invoke (core.clj:5530) At Clojure.core$load.doinvoke (core.clj:5529) at Clojure.lang.RestFn.invoke (restfn.java:408) at CLOJURE.L Ang.
Var.invoke (var.java:415) At backtype.storm.command.config_value.<clinit> (Unknown Source) caused by:java.lang.SecurityException: Invalid signature File Digest for Manifest main attributes at Sun.security.util.SignatureFileVerifier.processImpl ( signaturefileverifier.java:286) at Sun.security.util.SignatureFileVerifier.process (Signaturefileverifier.java : 239) at Java.util.jar.JarVerifier.processEntry (jarverifier.java:317) at Java.util.jar.JarVerifier.update ( jarverifier.java:228) at Java.util.jar.JarFile.initializeVerifier (jarfile.java:348) at Java.util.jar.JarFi
Le.getinputstream (jarfile.java:415) at Sun.misc.urlclasspath$jarloader$2.getinputstream (URLClassPath.java:775) At Sun.misc.Resource.cachedInputStream (resource.java:77) at Sun.misc.Resource.getByteBuffer (resource.java:160 ) at Java.net.URLClassLoader.defineClass (urlclassloader.java:436) at java.net.urlclassloader.access$100 (UR lclassloader.java:71) at Java.Net. Urlclassloader$1.run (urlclassloader.java:361) at Java.net.urlclassloader$1.run (urlclassloader.java:355) at Java.security.AccessController.doPrivileged (Native method) at Java.net.URLClassLoader.findClass (Urlclassloader.ja va:354) at Java.lang.ClassLoader.loadClass (classloader.java:425) at SUN.MISC.LAUNCHER$APPCLASSLOADER.LOADC Lass (launcher.java:308) at Java.lang.ClassLoader.loadClass (classloader.java:358) at BACKTYPE.STORM.UTILS.L Ocalstate.<clinit> (localstate.java:35) ... Km
Here tangled for a long time, did not add maven-shade-plugin when also can be successfully deployed, add, direct deployment when the error, find a half-day reason, found or packing the problem
Because we use the Maven-shade-plugin plug-in to package, where the package will be in the Meta-inf directory of the file appended to the package, so will result in the packaging of the Meta-inf directory more than a few *. SF, which causes the package to duplicate the reference, and thus the error: Java.lang.SecurityException:Invalid signature file Digest for Manifest main attributes
Solution:
The pom file continues to be added (for more information: http://blog.csdn.net/defonds/article/details/43233131):
<filters>
<filter>
<artifact>*:* </artifact>
<excludes>
< exclude>meta-inf/*. Sf</exclude>
<exclude>meta-inf/*. Dsa</exclude>
<exclude>meta-inf/*. rsa</exclude>
</excludes>
</filter>
</filters>
After adding this, the problem is solved.
This package deployment, deployment success, but God always so toss people, see Storm UI found again error, and then enter the log from the machine to view detailed error information:
Java.lang.NoSuchFieldError:IBM_JAVA at Org.apache.hadoop.security.UserGroupInformation.getOSLoginModuleName (User groupinformation.java:303) ~[stormjar.jar:na] at org.apache.hadoop.security.usergroupinformation.<clinit> (U sergroupinformation.java:348) ~[stormjar.jar:na] at Org.apache.storm.hdfs.common.security.HdfsSecurityUtil.login ( hdfssecurityutil.java:36) ~[stormjar.jar:na] at Org.apache.storm.hdfs.bolt.AbstractHdfsBolt.prepare (ABSTRACTHDFSB olt.java:104) ~[stormjar.jar:na] at Backtype.storm.daemon.executor$fn__4722$fn__4734.invoke (executor.clj:692) ~[st orm-core-0.9.4.jar:0.9.4] at Backtype.storm.util$async_loop$fn__458.invoke (util.clj:461) ~[storm-core-0.9.4.jar:0 .9.4] at Clojure.lang.AFn.run (afn.java:24) [Clojure-1.5.1.jar:na] at Java.lang.Thread.run (thread.java:745) [na:1.7.0_71] 2015-11-13t16:55:45.732+0800 b.s.d.executor [ERROR] Java.lang.NoSuchFieldError:IBM_JAVA at Org.ap Ache.hadoop.security.UserGrouPinformation.getosloginmodulename (usergroupinformation.java:303) ~[stormjar.jar:na] at org.apache.hadoop.security . Usergroupinformation.<clinit> (usergroupinformation.java:348) ~[stormjar.jar:na] at org.apache.storm.hdfs.co Mmon.security.HdfsSecurityUtil.login (hdfssecurityutil.java:36) ~[stormjar.jar:na] at ORG.APACHE.STORM.HDFS.BOLT.A Bstracthdfsbolt.prepare (abstracthdfsbolt.java:104) ~[stormjar.jar:na] at backtype.storm.daemon.executor$fn__4722$ Fn__4734.invoke (executor.clj:692) ~[storm-core-0.9.4.jar:0.9.4] at Backtype.storm.util$async_loop$fn__458.invoke ( util.clj:461) ~[storm-core-0.9.4.jar:0.9.4] at Clojure.lang.AFn.run (afn.java:24) [Clojure-1.5.1.jar:na] at Java.lang.Thread.run (thread.java:745) [na:1.7.0_71] 2015-11-13t16:55:45.823+0800 b.s.util [ERROR] halting process: (" Worker died ") Java.lang.RuntimeException: (" worker died ") at Backtype.storm.util$exit_process_bang_.doinvoke (util. clj:325) [storm-core-0.9.4.jar:0.9.4] at Clojure.lang.RestFn.invoke (restfn.java:423) [Clojure-1.5.1.jar:na] at Backtype.storm.daemon.worke R$fn__5102$fn__5103.invoke (worker.clj:495) [storm-core-0.9.4.jar:0.9.4] at Backtype.storm.daemon.executor$mk_exec Utor_data$fn__4555$fn__4556.invoke (executor.clj:240) [storm-core-0.9.4.jar:0.9.4] at Backtype.storm.util$async_lo Op$fn__458.invoke (util.clj:473) [storm-core-0.9.4.jar:0.9.4] at Clojure.lang.AFn.run (afn.java:24) [CLOJURE-1.5.1.J Ar:na] at Java.lang.Thread.run (thread.java:745) [na:1.7.0_71]
At the beginning, completely do not know what the ghost of this error, and then check the Internet, some people said that the error is missing Hadoop-auth this jar package, details can refer to: http://stackoverflow.com/questions/22278620/ the-ibm-java-error-for-running-jobs-in-hadoop-2-2-0
So in Pom.xml introduced this package (the package version of only 1.x and 2.x will have an impact, the storm introduced in the Hadoop is 2.x version, this package must also be 2.x version, here I introduced the 2.7.1):
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactid>hadoop-auth</ artifactid>
<version>2.7.1</version>
</dependency>
Originally thought the problem should be solved, but found that more errors, not just before the error is still, and a mistake:
Java.lang.NoClassDefFoundError:Could not initialize class org.apache.log4j.Log4jLoggerFactory at org.apache.log4j . Logger.getlogger (logger.java:39) ~[log4j-over-slf4j-1.6.6.jar:1.6.6] at Kafka.utils.logging$class.logger (Logging. scala:24) ~[stormjar.jar:na] at Kafka.network.blockingchannel.logger$lzycompute (blockingchannel.scala:35) ~[STORMJ Ar.jar:na] at Kafka.network.BlockingChannel.logger (blockingchannel.scala:35) ~[stormjar.jar:na] at kafka.u TILs. Logging$class.debug (logging.scala:51) ~[stormjar.jar:na] at Kafka.network.BlockingChannel.debug (blockingchannel.s
cala:35) ~[stormjar.jar:na] at Kafka.network.BlockingChannel.connect (blockingchannel.scala:64) ~[stormjar.jar:na] At Kafka.consumer.SimpleConsumer.connect (simpleconsumer.scala:44) ~[stormjar.jar:na] at Kafka.consumer.Sim Pleconsumer.getormakeconnection (simpleconsumer.scala:142) ~[stormjar.jar:na] at Kafka.consumer.SimpleConsumer.kaf Ka$consumer$simpleconsumer$ $sendRequest (simpleconsumer.scala:69) ~[stormjar.jar:na] at Kafka.consumer.SimpleConsumer.getOffsetsBefore ( simpleconsumer.scala:124) ~[stormjar.jar:na] at Kafka.javaapi.consumer.SimpleConsumer.getOffsetsBefore (simplecons
umer.scala:79) ~[stormjar.jar:na] at Storm.kafka.KafkaUtils.getOffset (kafkautils.java:77) ~[stormjar.jar:na] At Storm.kafka.KafkaUtils.getOffset (kafkautils.java:67) ~[stormjar.jar:na] at storm.kafka.partitionmanager.< Init> (partitionmanager.java:83) ~[stormjar.jar:na] at Storm.kafka.ZkCoordinator.refresh (zkcoordinator.java:98)
~[stormjar.jar:na] at Storm.kafka.ZkCoordinator.getMyManagedPartitions (zkcoordinator.java:69) ~[stormjar.jar:na] At Storm.kafka.KafkaSpout.nextTuple (kafkaspout.java:135) ~[stormjar.jar:na] at Backtype.storm.daemon.execu Tor$fn__4654$fn__4669$fn__4698.invoke (executor.clj:565) ~[storm-core-0.9.4.jar:0.9.4] at Backtype.storm.util$asyn C_loop$fn__458.invoke (UTIL.Clj:463) ~[storm-core-0.9.4.jar:0.9.4] at Clojure.lang.AFn.run (afn.java:24) [Clojure-1.5.1.jar:na] at Java. Lang. Thread.run (thread.java:745) [na:1.7.0_71] 2015-11-13t17:06:50.012+0800 b.s.d.executor [ERROR]
Then find the reason, online search for half a day, said to be log4j and SLF4J-LOG4J12 package conflict, see details: http://erichua.iteye.com/blog/1182090
The pom file is then modified to remove SLF4J-LOG4J12 dependencies in the newly added Hadoop-auth package as follows:
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactid>hadoop-auth</ artifactid>
<version>2.7.1</version>
<exclusions>
<exclusion>
< groupid>org.slf4j</groupid>
<artifactId>slf4j-log4j12</artifactId>
</exclusion >
</exclusions>
</dependency>
And then package deployment, sure enough this slf4j-log4j12 this package conflict resolution, but the previous Java.lang.NoSuchFieldError:IBM_JAVA error is still, continue to find the cause, found behind, add Hadoop-auth is not wrong , but there's a ibm_java in the Org.apache.hadoop.util.PlatformName in the Hadoop-auth package, and this thing needs to be used, But in Hadoop-core this package also has org.apache.hadoop.util.PlatformName this class, so when the program runs, it will run to the inside of Hadoop-core Org.apache.hadoop.util.PlatformNa Me to find Ibm_java, but can not find, so the error, so in Hadoop-auth to remove Hadoop-core this package, while in the possibility of relying on hadoop-core this package place are removed, so modify the Pom.xml file:
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactid>hadoop-auth</ artifactid>
<version>2.7.1</version>
<exclusions>
<exclusion>
< groupid>org.slf4j</groupid>
<artifactId>slf4j-log4j12</artifactId>
</exclusion >
<exclusion>
<groupId>org.apache.hadoop</groupId>
<artifactId> hadoop-core</artifactid>
</exclusion>
</exclusions>
</dependency>
And then smoothly packaged, deployed up, and finally solve all problems, is really a bitter tears, Emperor not negative, STORM-HDFS integration completed.