The demand now is to write HDFs on the Hadoop cluster on a flume acquisition machine, which does not have Hadoop installed.
The flume version here is the 1.6.0,hadoop version of 2.7.1.
Copy the Hdfs-site.xml, core-site.xml two configuration files of the Hadoop cluster tothe Conf directory of the Flume installation directory, and copy the Hadoop-hdfs-2.7.1.jar to the Flume Lib directory.
One, flume configuration file:
a1.sources = R1a1.channels = C1a1.sinks = K1a1.sources.r1.type = Syslogtcpa1.sources.r1.bind = 192.168.110.160 # native ipa1.so Urces.r1.port = 23003a1.sources.r1.workerthreads = 10a1.channels.c1.type = Memorya1.channels.c1.capacity = 1000000a1.channels.c1.transactioncapacity = 100000a1.channels.c1.keep-alive = 6a1.channels.c1.bytecapacitybufferpercentage = 20a1.sinks.k1.type = Hdfsa1.sinks.k1.hdfs.path = hdfs://clusterpc/ Test/flume/%y-%m-%da1.sinks.k1.hdfs.fileprefix = Events-a1.sinks.k1.hdfs.round = Truea1.sinks.k1.hdfs.roundValue = 10a1.sinks.k1.hdfs.roundunit = Minutea1.sinks.k1.hdfs.uselocaltimestamp=truea1.sources.r1.channels = C1a1.sinks.k1.channel = C1
Start: Bin/flume-ng agent--conf conf--conf-file conf/flume-tcp-memory-hdfs.conf--name A1-dflume.root.logger=info, Console
Second, the error set:
1. Host name not found
2016-09-19 16:15:48,518 (sinkrunner-pollingrunner-defaultsinkprocessor) [ERROR- Org.apache.flume.sink.hdfs.HDFSEventSink.process (hdfseventsink.java:459)] Process Failedjava.lang.IllegalArgumentException:java.net.UnknownHostException:cluster at Org.apache.hadoop.security.Sec Urityutil.buildtokenservice (securityutil.java:378) at Org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy ( namenodeproxies.java:310) at Org.apache.hadoop.hdfs.NameNodeProxies.createProxy (namenodeproxies.java:176) at Org.apache.hadoop.hdfs.dfsclient.<init> (dfsclient.java:678) at org.apache.hadoop.hdfs.dfsclient.<init& gt; (dfsclient.java:619) at Org.apache.hadoop.hdfs.DistributedFileSystem.initialize (Distributedfilesystem.java : 149) at Org.apache.hadoop.fs.FileSystem.createFileSystem (filesystem.java:2653) at Org.apache.hadoop.fs.File system.access$200 (filesystem.java:92) at org.apache.hadoop.fs.filesystem$cache.getinternal (FileSystem.java:2687) At Org.apache.hadoop.fs.filesystem$cache.get (filesystem.java:2669) at Org.apache.hadoop.fs.FileSystem.get (Fi lesystem.java:371) at Org.apache.hadoop.fs.FileSystem.get (filesystem.java:170) at Org.apache.hadoop.fs.FileS Ystem.get (filesystem.java:355) at Org.apache.hadoop.fs.Path.getFileSystem (path.java:295) at Org.apache.flume . Sink.hdfs.bucketwriter$1.call (bucketwriter.java:243) at Org.apache.flume.sink.hdfs.bucketwriter$1.call ( bucketwriter.java:235) at Org.apache.flume.sink.hdfs.bucketwriter$9$1.run (bucketwriter.java:679) at Org.apac He.flume.auth.SimpleAuthenticator.execute (SIMPLEAUTHENTICATOR.JAVA:50) at Org.apache.flume.sink.hdfs.bucketwriter$9.call (bucketwriter.java:676) at Java.util.concurrent.FutureTask.run ( futuretask.java:262) at Java.util.concurrent.ThreadPoolExecutor.runWorker (threadpoolexecutor.java:1145) at J Ava.util.concurrent.threadpoolexecutor$worker.run (threadpoolexecutor.java:615) At Java.lang.Thread.run (thread.java:744) caused by:java.net.UnknownHostException:cluster
Cluster is the name of the company's Hadoop cluster Nameservice, which is due to the fact that the Hadoop cluster Nameservice is not found, so the hdfs-site.xml needs to be copied to the Flume/conf directory.
2.
Java.io.IOException:Mkdirs failed to create/test/flume/16-09-19 (Exists=false, cwd=file:/data/ Apache-flume-1.6.0-bin) at Org.apache.hadoop.fs.ChecksumFileSystem.create (checksumfilesystem.java:450) at or G.apache.hadoop.fs.checksumfilesystem.create (checksumfilesystem.java:435) at Org.apache.hadoop.fs.FileSystem.create (filesystem.java:909) at Org.apache.hadoop.fs.FileSystem.create ( filesystem.java:890) at Org.apache.hadoop.fs.FileSystem.create (filesystem.java:787) at Org.apache.hadoop.fs. Filesystem.create (filesystem.java:776) at Org.apache.flume.sink.hdfs.HDFSSequenceFile.open (Hdfssequencefile.java : (+) at Org.apache.flume.sink.hdfs.HDFSSequenceFile.open (hdfssequencefile.java:78) at Org.apache.flume.sink. Hdfs. Hdfssequencefile.open (hdfssequencefile.java:69) at Org.apache.flume.sink.hdfs.bucketwriter$1.call ( bucketwriter.java:246) at Org.apache.flume.sink.hdfs.bucketwriter$1.call (bucketwriter.java:235) at org. Apache.flume.sink.hdfs.bucketwriter$9$1.run (bucketwriter.java:679) at Org.apache.flume.auth.SimpleAuthenticator.execute (SIMPLEAUTHENTICATOR.JAVA:50) at Org.apache.flume.sink.hdfs.bucketwriter$9.call (bucketwriter.java:676) at Java.util.concurrent.FutureTask.run ( futuretask.java:262) at Java.util.concurrent.ThreadPoolExecutor.runWorker (threadpoolexecutor.java:1145) at J Ava.util.concurrent.threadpoolexecutor$worker.run (threadpoolexecutor.java:615) at Java.lang.Thread.run ( thread.java:744)
Copy the Core-site.xml to the flume/conf directory
3.
Java.io.IOException:No FileSystem for Scheme:hdfs at Org.apache.hadoop.fs.FileSystem.getFileSystemClass (filesyste m.java:2644) at Org.apache.hadoop.fs.FileSystem.createFileSystem (filesystem.java:2651) at Org.apache.hadoop. Fs. filesystem.access$200 (filesystem.java:92) at Org.apache.hadoop.fs.filesystem$cache.getinternal (FileSystem.java : 2687) at Org.apache.hadoop.fs.filesystem$cache.get (filesystem.java:2669) at Org.apache.hadoop.fs.FileSystem . Get (filesystem.java:371) at Org.apache.hadoop.fs.FileSystem.get (filesystem.java:170) at ORG.APACHE.HADOOP.F S.filesystem.get (filesystem.java:355) at Org.apache.hadoop.fs.Path.getFileSystem (path.java:295) at Org.apach E.flume.sink.hdfs.bucketwriter$1.call (bucketwriter.java:243) at Org.apache.flume.sink.hdfs.bucketwriter$1.call ( bucketwriter.java:235) at Org.apache.flume.sink.hdfs.bucketwriter$9$1.run (bucketwriter.java:679) at Org.apac He.flume.auth.SimpleAuthenticaTor.execute (simpleauthenticator.java:50) at Org.apache.flume.sink.hdfs.bucketwriter$9.call (BucketWriter.java:676 ) at Java.util.concurrent.FutureTask.run (futuretask.java:262) at Java.util.concurrent.ThreadPoolExecutor.run Worker (threadpoolexecutor.java:1145) at Java.util.concurrent.threadpoolexecutor$worker.run ( threadpoolexecutor.java:615) at Java.lang.Thread.run (thread.java:744)
Copy the Hadoop-hdfs-2.7.1.jar to the Flume/lib directory
4, HDFs permissions are insufficient, where the user to the HDFs write file is logged in flume acquisition machine users.
Org.apache.hadoop.security.AccessControlException:Permission Denied:user=kafka, Access=write, inode= "/test/flume/ 16-09-19/events-.1474268726127.tmp ": Hadoop:supergroup:drwxr-xr-x at Org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check (fspermissionchecker.java:319) at Org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check (fspermissionchecker.java:292) at Org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission (fspermissionchecker.java:213) at Org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission (fspermissionchecker.java:190) at Org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission (fsdirectory.java:1698) at Org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission (fsdirectory.java:1682) at Org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess (fsdirectory.java:1665) at Org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInterNAL (fsnamesystem.java:2517) at Org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt ( fsnamesystem.java:2452) at Org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile (fsnamesystem.java:2335) At Org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create (namenoderpcserver.java:623) at Org.apache. Hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create ( clientnamenodeprotocolserversidetranslatorpb.java:397) at Org.apache.hadoop.hdfs.protocol.proto.clientnamenodeprotocolprotos$clientnamenodeprotocol$2.callblockingmethod (Clientnamenodeprotocolprotos.java) at Org.apache.hadoop.ipc.protobufrpcengine$server$protobufrpcinvoker.call ( protobufrpcengine.java:616) at Org.apache.hadoop.ipc.rpc$server.call (rpc.java:969) at ORG.APACHE.HADOOP.IPC. Server$handler$1.run (server.java:2049) at Org.apache.hadoop.ipc.server$handler$1.run (Server.java:2045) at Ja Va.security.AccessController.doPrivileged (NAtive Method) at Javax.security.auth.Subject.doAs (subject.java:415) at Org.apache.hadoop.security.UserGroupI Nformation.doas (usergroupinformation.java:1657) at Org.apache.hadoop.ipc.server$handler.run (Server.java:2043)
HDFS permissions are insufficient to authorize. Hadoop fs-chmod-r 777/test/
5. Time Stamp
java.lang.NullPointerException:Expected timestamp in the Flume event headers, but it is null at com.google.common.ba Se. Preconditions.checknotnull (preconditions.java:204) at Org.apache.flume.formatter.output.BucketPath.replaceShorthand (bucketpath.java:228) at Org.apache.flume.formatter.output.BucketPath.escapeString (bucketpath.java:432) at Org.apache.flume.sink.hdfs.HDFSEventSink.process (hdfseventsink.java:380) at Org.apache.flume.sink.DefaultSinkProcessor.process (defaultsinkprocessor.java:68) at Org.apache.flume.sinkrunner$pollingrunner.run (sinkrunner.java:147) at Java.lang.Thread.run (thread.java:744)
The reason is that the event object headers is not set timestamp, workaround: Set a1.sinks.k1.hdfs.uselocaltimestamp=true, use local timestamp.
Flume Remote Write HDFs