Flume Remote Write HDFs

Source: Internet
Author: User
Tags hadoop fs

The demand now is to write HDFs on the Hadoop cluster on a flume acquisition machine, which does not have Hadoop installed.

The flume version here is the 1.6.0,hadoop version of 2.7.1.

Copy the Hdfs-site.xml, core-site.xml two configuration files of the Hadoop cluster tothe Conf directory of the Flume installation directory, and copy the Hadoop-hdfs-2.7.1.jar to the Flume Lib directory.

One, flume configuration file:

a1.sources = R1a1.channels = C1a1.sinks = K1a1.sources.r1.type = Syslogtcpa1.sources.r1.bind = 192.168.110.160 # native ipa1.so Urces.r1.port = 23003a1.sources.r1.workerthreads  = 10a1.channels.c1.type = Memorya1.channels.c1.capacity = 1000000a1.channels.c1.transactioncapacity = 100000a1.channels.c1.keep-alive = 6a1.channels.c1.bytecapacitybufferpercentage = 20a1.sinks.k1.type = Hdfsa1.sinks.k1.hdfs.path = hdfs://clusterpc/ Test/flume/%y-%m-%da1.sinks.k1.hdfs.fileprefix = Events-a1.sinks.k1.hdfs.round = Truea1.sinks.k1.hdfs.roundValue = 10a1.sinks.k1.hdfs.roundunit = Minutea1.sinks.k1.hdfs.uselocaltimestamp=truea1.sources.r1.channels = C1a1.sinks.k1.channel = C1

Start: Bin/flume-ng agent--conf conf--conf-file conf/flume-tcp-memory-hdfs.conf--name A1-dflume.root.logger=info, Console

Second, the error set:

1. Host name not found

2016-09-19 16:15:48,518 (sinkrunner-pollingrunner-defaultsinkprocessor) [ERROR- Org.apache.flume.sink.hdfs.HDFSEventSink.process (hdfseventsink.java:459)] Process Failedjava.lang.IllegalArgumentException:java.net.UnknownHostException:cluster at Org.apache.hadoop.security.Sec Urityutil.buildtokenservice (securityutil.java:378) at Org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy (  namenodeproxies.java:310) at Org.apache.hadoop.hdfs.NameNodeProxies.createProxy (namenodeproxies.java:176) at Org.apache.hadoop.hdfs.dfsclient.<init> (dfsclient.java:678) at org.apache.hadoop.hdfs.dfsclient.<init& gt; (dfsclient.java:619) at Org.apache.hadoop.hdfs.DistributedFileSystem.initialize (Distributedfilesystem.java : 149) at Org.apache.hadoop.fs.FileSystem.createFileSystem (filesystem.java:2653) at Org.apache.hadoop.fs.File system.access$200 (filesystem.java:92) at org.apache.hadoop.fs.filesystem$cache.getinternal (FileSystem.java:2687)       At Org.apache.hadoop.fs.filesystem$cache.get (filesystem.java:2669) at Org.apache.hadoop.fs.FileSystem.get (Fi lesystem.java:371) at Org.apache.hadoop.fs.FileSystem.get (filesystem.java:170) at Org.apache.hadoop.fs.FileS Ystem.get (filesystem.java:355) at Org.apache.hadoop.fs.Path.getFileSystem (path.java:295) at Org.apache.flume . Sink.hdfs.bucketwriter$1.call (bucketwriter.java:243) at Org.apache.flume.sink.hdfs.bucketwriter$1.call ( bucketwriter.java:235) at Org.apache.flume.sink.hdfs.bucketwriter$9$1.run (bucketwriter.java:679) at Org.apac He.flume.auth.SimpleAuthenticator.execute (SIMPLEAUTHENTICATOR.JAVA:50) at Org.apache.flume.sink.hdfs.bucketwriter$9.call (bucketwriter.java:676) at Java.util.concurrent.FutureTask.run ( futuretask.java:262) at Java.util.concurrent.ThreadPoolExecutor.runWorker (threadpoolexecutor.java:1145) at J    Ava.util.concurrent.threadpoolexecutor$worker.run (threadpoolexecutor.java:615)    At Java.lang.Thread.run (thread.java:744) caused by:java.net.UnknownHostException:cluster 

Cluster is the name of the company's Hadoop cluster Nameservice, which is due to the fact that the Hadoop cluster Nameservice is not found, so the hdfs-site.xml needs to be copied to the Flume/conf directory.

2.

Java.io.IOException:Mkdirs failed to create/test/flume/16-09-19 (Exists=false, cwd=file:/data/ Apache-flume-1.6.0-bin) at Org.apache.hadoop.fs.ChecksumFileSystem.create (checksumfilesystem.java:450) at or G.apache.hadoop.fs.checksumfilesystem.create (checksumfilesystem.java:435) at Org.apache.hadoop.fs.FileSystem.create (filesystem.java:909) at Org.apache.hadoop.fs.FileSystem.create ( filesystem.java:890) at Org.apache.hadoop.fs.FileSystem.create (filesystem.java:787) at Org.apache.hadoop.fs. Filesystem.create (filesystem.java:776) at Org.apache.flume.sink.hdfs.HDFSSequenceFile.open (Hdfssequencefile.java : (+) at Org.apache.flume.sink.hdfs.HDFSSequenceFile.open (hdfssequencefile.java:78) at Org.apache.flume.sink. Hdfs. Hdfssequencefile.open (hdfssequencefile.java:69) at Org.apache.flume.sink.hdfs.bucketwriter$1.call ( bucketwriter.java:246) at Org.apache.flume.sink.hdfs.bucketwriter$1.call (bucketwriter.java:235) at org. Apache.flume.sink.hdfs.bucketwriter$9$1.run (bucketwriter.java:679) at Org.apache.flume.auth.SimpleAuthenticator.execute (SIMPLEAUTHENTICATOR.JAVA:50) at Org.apache.flume.sink.hdfs.bucketwriter$9.call (bucketwriter.java:676) at Java.util.concurrent.FutureTask.run ( futuretask.java:262) at Java.util.concurrent.ThreadPoolExecutor.runWorker (threadpoolexecutor.java:1145) at J Ava.util.concurrent.threadpoolexecutor$worker.run (threadpoolexecutor.java:615) at Java.lang.Thread.run ( thread.java:744)

Copy the Core-site.xml to the flume/conf directory

3.

Java.io.IOException:No FileSystem for Scheme:hdfs at Org.apache.hadoop.fs.FileSystem.getFileSystemClass (filesyste m.java:2644) at Org.apache.hadoop.fs.FileSystem.createFileSystem (filesystem.java:2651) at Org.apache.hadoop. Fs. filesystem.access$200 (filesystem.java:92) at Org.apache.hadoop.fs.filesystem$cache.getinternal (FileSystem.java : 2687) at Org.apache.hadoop.fs.filesystem$cache.get (filesystem.java:2669) at Org.apache.hadoop.fs.FileSystem . Get (filesystem.java:371) at Org.apache.hadoop.fs.FileSystem.get (filesystem.java:170) at ORG.APACHE.HADOOP.F S.filesystem.get (filesystem.java:355) at Org.apache.hadoop.fs.Path.getFileSystem (path.java:295) at Org.apach E.flume.sink.hdfs.bucketwriter$1.call (bucketwriter.java:243) at Org.apache.flume.sink.hdfs.bucketwriter$1.call ( bucketwriter.java:235) at Org.apache.flume.sink.hdfs.bucketwriter$9$1.run (bucketwriter.java:679) at Org.apac He.flume.auth.SimpleAuthenticaTor.execute (simpleauthenticator.java:50) at Org.apache.flume.sink.hdfs.bucketwriter$9.call (BucketWriter.java:676 ) at Java.util.concurrent.FutureTask.run (futuretask.java:262) at Java.util.concurrent.ThreadPoolExecutor.run Worker (threadpoolexecutor.java:1145) at Java.util.concurrent.threadpoolexecutor$worker.run ( threadpoolexecutor.java:615) at Java.lang.Thread.run (thread.java:744)

Copy the Hadoop-hdfs-2.7.1.jar to the Flume/lib directory

4, HDFs permissions are insufficient, where the user to the HDFs write file is logged in flume acquisition machine users.

Org.apache.hadoop.security.AccessControlException:Permission Denied:user=kafka, Access=write, inode= "/test/flume/ 16-09-19/events-.1474268726127.tmp ": Hadoop:supergroup:drwxr-xr-x at Org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check (fspermissionchecker.java:319) at Org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check (fspermissionchecker.java:292) at Org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission (fspermissionchecker.java:213) at Org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission (fspermissionchecker.java:190) at Org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission (fsdirectory.java:1698) at Org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission (fsdirectory.java:1682) at Org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess (fsdirectory.java:1665) at Org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInterNAL (fsnamesystem.java:2517) at Org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt (        fsnamesystem.java:2452) at Org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile (fsnamesystem.java:2335) At Org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create (namenoderpcserver.java:623) at Org.apache. Hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create ( clientnamenodeprotocolserversidetranslatorpb.java:397) at Org.apache.hadoop.hdfs.protocol.proto.clientnamenodeprotocolprotos$clientnamenodeprotocol$2.callblockingmethod (Clientnamenodeprotocolprotos.java) at Org.apache.hadoop.ipc.protobufrpcengine$server$protobufrpcinvoker.call ( protobufrpcengine.java:616) at Org.apache.hadoop.ipc.rpc$server.call (rpc.java:969) at ORG.APACHE.HADOOP.IPC. Server$handler$1.run (server.java:2049) at Org.apache.hadoop.ipc.server$handler$1.run (Server.java:2045) at Ja Va.security.AccessController.doPrivileged (NAtive Method) at Javax.security.auth.Subject.doAs (subject.java:415) at Org.apache.hadoop.security.UserGroupI Nformation.doas (usergroupinformation.java:1657) at Org.apache.hadoop.ipc.server$handler.run (Server.java:2043)

HDFS permissions are insufficient to authorize. Hadoop fs-chmod-r 777/test/

5. Time Stamp

java.lang.NullPointerException:Expected timestamp in the Flume event headers, but it is null at        com.google.common.ba Se. Preconditions.checknotnull (preconditions.java:204) at        Org.apache.flume.formatter.output.BucketPath.replaceShorthand (bucketpath.java:228) at        Org.apache.flume.formatter.output.BucketPath.escapeString (bucketpath.java:432) at        Org.apache.flume.sink.hdfs.HDFSEventSink.process (hdfseventsink.java:380) at        Org.apache.flume.sink.DefaultSinkProcessor.process (defaultsinkprocessor.java:68) at        Org.apache.flume.sinkrunner$pollingrunner.run (sinkrunner.java:147) at        Java.lang.Thread.run (thread.java:744)

The reason is that the event object headers is not set timestamp, workaround: Set a1.sinks.k1.hdfs.uselocaltimestamp=true, use local timestamp.

Flume Remote Write HDFs

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.