Flume use summary of data sent to Kafka, HDFs, Hive, HTTP, netcat, etc.

Source: Internet
Author: User
Tags http post hadoop fs

1, source is HTTP mode, sink is logger mode, the data is printed in the console. The conf configuration file is as follows: # Name The components in this agenta1.sources = R1a1.sinks = K1a1.channels = c1 # Describe/configure the S Ourcea1.sources.r1.type = http #该设置表示接收通过http方式发送过来的数据a1. sources.r1.bind = hadoop-master # The host or IP address running flume can be a1.sources.r1.port = 9000# Port #a1.sources.r1.fileheader = true # Describe the Sinka1.sinks.k1.type = logger# This setting means that data is printed in the console  # use a channel which buffers events in Memorya1.channels.c1.type = Me morya1.channels.c1.capacity = 1000a1.channels.c1.transactioncapacity = 100 # Bind The source and sink to the Channela 1.sources.r1.channels = C1a1.sinks.k1.channel = C1 Start flume command is: Bin/flume-ng agent-c conf-f conf/http.conf-n a1-dflume.ro Ot.logger=info,console. The following information is displayed to indicate that the startup Flume succeeded. 895 (lifecyclesupervisor-1-3) [Info-org.apache.flume.instrumentation.monitoredcountergroup.start ( monitoredcountergroup.java:96)] Component Type:source, NAME:R1 started open another terminal, send data via HTTP POST: Curl-x post-d ' [{' H Eaders ": {" Timestampe":" 1234567 "," host ":" Master "}," Body ":" Badou Flume "}] ' hadoop-master:9000. Hadoop-master is the hostname of the flume configuration file binding, and 9000 is the bound port. And then in the window that runs flume, you see something like this: 2018-06-12 08:24:04,472 (sinkrunner-pollingrunner-defaultsinkprocessor) [INFO- Org.apache.flume.sink.LoggerSink.process (loggersink.java:94)] Event: {headers:{timestampe=1234567, host=master} body:62 6F 6C Badou Flume} 2, source for Netcat (UDP, TCP mode), sink for logger mode, the data is printed in the console conf configuration file as follows: A1. Sources = R1a1.sinks = K1a1.channels = C1 a1.sources.r1.type = Netcata1.sources.r1.bind = hadoop-master# The host name or IP address of the binding A1.sources.r1.port = 44444 a1.sinks.k1.type = Logger a1.channels.c1.type = memorya1.channels.c1.capacity = 1000a1.channels.c1.transcationcapacity = 100 a1.sources.r1.channels = C1a1.sinks.k1.channel = C1 start Flumebin/flume-ng agent-c conf-f conf/netcat.conf-n a1-dflume.root.logger=info,console.   Then at another terminal, use Telnet to send data: command: Telnet hadoop-maser 44444 [[email protected] ~]# telnet hadoop-master 44444Trying 192.168.194.6...Connected to Hadoop-master. Escape character is ' ^] '. Displaying the above information indicates that the connection flume succeeded, and then enter: 12213213213ok12321313ok will receive the corresponding message in flume: 2018-06-12 08:38:51,129 ( Sinkrunner-pollingrunner-defaultsinkprocessor) [Info-org.apache.flume.sink.loggersink.process (LoggerSink.java : 94)] Event: {headers:{} body:31 0D 12213213213.} 2018-06-12 08:38:51,130 (sinkrunner-pollingrunner-defaultsinkprocessor) [INFO- Org.apache.flume.sink.LoggerSink.process (loggersink.java:94)] Event: {headers:{} body:31 0D 1232131 3.} 3, source is Netcat/http mode, sink is HDFs mode, data is stored in HDFs. The conf configuration file is as follows, the file name is hdfs.conf:# name, the components in this agenta1.sources = R1a1.sinks = K1a1.channels = c1 # describe/c Onfigure the Sourcea1.sources.r1.type = Netcata1.sources.r1.bind = Hadoop-mastera1.sources.r1.port = 44444  A1.sources.r1.interceptors = I1a1.sources.r1.interceptors.i1.type =regex_ Filtera1.sources.r1.interceptors.i1.regex =^[0-9]* $a 1.sources.r1.interceptorS.i1.excludeevents =true # Describe The Sink#a1.sinks.k1.type = Loggera1.channels = C1a1.sinks = K1a1.sinks.k1.type = Hdfsa1.sinks.k1.channel = C1a1.sinks.k1.hdfs.path = hdfs:/flume/events # Location of files stored in the HDFs file system A1.sinks.k1.hdfs.filePrefix = events-#文件的前缀a1. Sinks.k1.hdfs.round = Truea1.sinks.k1.hdfs.roundValue = 10a1.sinks.k1.hdfs.roundunit = Minutea1.sinks.k1.hdfs.fileType = DataStream # Make a file format for storing data transmitted from flume in text format. # Use a channel which buffers events in Memorya1.channels.c1.type = Memorya1.channels.c1.capacity = 1000a1.channels.c1.tra nsactioncapacity = 100 # Bind The source and sink to the Channela1.sources.r1.channels = C1a1.sinks.k1.channel = C1&N In the HDFs file system, create a path to the file store: Hadoop fs-mkdir/flume/event1.   Start Flume:bin/flume-ng agent-c conf-f conf/hdfs.conf-n a1-dflume.root.logger=info,console  Send files to flume in Telnet mode: Telnet hadoop-master 44444 and enter:aaaaaaaabbbbbbbcccccccccdddddddddd  by following the command Hadoop fs-ls/ flume/events/view the files in HDFs, you can see that/flume/events in HDFs has the following files:-rw-r--r--3 root supergroup 2018-06-05 06:02/flume/events/events-.1528203709070-rw-r--r--3 root supergroup 5 2018-0 6-05 06:02/flume/events/events-.1528203755556-rw-r--r--3 root supergroup 2018-06-05 06:03/flume/events/ events-.1528203755557-rw-r--r--3 root supergroup 2018-06-13 07:28/flume/events/events-.1528900112215-rw-r--r--3 Root supergroup 209 2018-06-13 07:29/flume/events/events-.1528900112216-rw-r--r--3 root supergroup 72 2018-06-13 07:29/ flume/events/events-.1528900112217 via Hadoop fs-cat/flume/events/ events-.1528900112216 viewing the contents of the file events-.1528900112216: aaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbcccccccccccccccccccddddddddddddddddeee Eeeeeeeeeeeeeeeefffffffffffffffffffffffgggggggggggggggggghhhhhhhhhhhhhhhhhhhhhhhiiiiiiiiiiiiiiiiiiijjjjjjjjjjjjjjjjjjj The  http mode is to change the netcat in the hdfs.conf file to HTTP, then transfer the file from Telnet to: Curl-x post-d ' [{"headers": {"Timestampe": "1234567", " Host ":" Master "}," Body ":" Badou Flume "}] ' hadoop-master:44444. In the Hadoop file you will see the contents of the above command transmission: Badou Flume.  4, Source is netCat/http mode, sink is hive mode, stores data in hive, and partitions storage. The Conf is configured as follows with the file name hive.conf:# describe/configure the Sourcea1.sources.r1.type = Netcata1.sources.r1.bind = Hadoop-mastera1.sources.r1.port = 44444 # Describe the Sink#a1.sinks.k1.type = Loggera1.channels = C1a1.sinks = K1A1 . Sinks.k1.type = hivea1.sinks.k1.hive.metastore=thrift://hadoop-master:9083a1.sinks.k1.hive.database=default# Hive Database name a1.sinks.k1.hive.table=flume_user1a1.sinks.k1.serializer=delimiteda1.sinks.k1.hive.partition=3# If you are in Netcat mode, you can only statically set the value of the partition because the NETCAT mode transmits the data and cannot transfer the value of a field, only in order. This sets the Age's partition value to 3. #a1. Sinks.k1.hive.partition=%{age} #如果以http或json等模式, the value of the partition can only be set dynamically because the HTTP mode dynamically transmits the value of age. A1.sinks.k1.serializer.delimiter= "" A1.sinks.k1.serializer.serderseparator= "a1.sinks.k1.serializer.fieldnames= User_id,user_namea1.sinks.k1.hive.txnsperbatchask = 10a1.sinks.k1.hive.batchsize = 1500 # Use a channel which Buffers events in Memorya1.channels.c1.type = memorya1.channels.c1.capacity = 1000a1.channels.c1.transactioncapacity = 100 # Bind the SOURCE and sink to the Channela1.sources.r1.channels = C1a1.sinks.k1.channel = c1  CREATE table in hive: Create TABLE Flume_user (user_ ID int,user_name String) partitioned by (age int) clustered by (user_id) into 2 bucketsstored as orc  added in Hive-site.xml as Content below: <property><name>javax.jdo.option.connectionpassword</name><value>hive</value ><description>password to use against Metastore database</description></property> < Property><name>hive.support.concurrency</name><value>true</value></property> <property><name>hive.exec.dynamic.partition.mode</name><value>nonstrict</value> </property><property><name>hive.txn.manager</name><value> Org.apache.hadoop.hive.ql.lockmgr.dbtxnmanager</value></property><property><name> hive.compactor.initiator.on</name><value>true</value></property><property>< Name>hive.compactor.worker.threads</name><value>1</value></property>  the/hcatalog/share/in the hive root directory The following three folders in the Hcatalog folder are added to the Flume Lib directory. Run Flume:bin/flume-ng agent-c conf-f conf/hive.conf-n a1-dflume.root.logger=info,console.   Reopen a window to start the Metastroe service: Hive--service Metastore & reopen a client to connect to Flumetelnet Hadoop-master via Telnet 44444 then input: 1 13 3 The following two lines of data will be seen in hive: flume_user1.user_id flume_user1.user_name flume_user1.age1 1 33 3 3age is the value 3 set in hive.conf.   Now converts the source of Flume to HTTP mode, and then the hive partition dynamically transfers the partition value through the parameter mode. Change A1.sources.r1.type = Netcat in hive.conf to A1.sources.r1.type = httpa1.sinks.k1.hive.partition= 3 Change to A1.sinks.k1.hive.partition=%{age}. Then start flume:bin/flume-ng agent-c conf-f conf/hive.conf-n a1-dflume.root.logger=info,console. Transfer data to Flumecurl-x post-d ' [{' Headers ': ' 109 '}, ' body ': ' One Ligongong '}] ' hadoop-master:44444 in the reopened window via HTTP mode. The following data can be seen in hive: flume_user1.user_id flume_user1.user_name flume_user1.age11 Ligongong 109 So you can see that when you transfer data to hive through HTTP mode , the information for the partition field is transmitted in the header, while the information for the other fields is placed in the BAdy, and separated by a delimiter defined by the hive.conf file between the different columns. &NBSP;5, use Avro mode to print the data in the console. Transferring data between different agents can only be done through the Avro mode. Here we need two servers to demonstrate the use of Avro, the two servers are hadoop-master and Hadoop-slave2hadoop-master run Agent2, and then specify Agent2 sink as Avro, and set the host name of the data sent to Hadoop-slave2. The Conf file for flume in Hadoop-master is set as follows, with the name push.conf: #Name The components in this agenta2.sources= r1a2.sinks= k1a2.channels= c1  #Describe/configure the sourcea2.sources.r1.type= netcata2.sources.r1.bind= Hadoop-mastera2.sources.r1.port = 44444a2.sources.r1.channels= c1  #Use a channel which buffers events in Memorya2.channels.c1.type= memorya2.channels.c1.keep-alive= 10a2.channels.c1.capacity= 100000a2.channels.c1.transactioncapacity= 100000  #Describe/configure the sourcea2.sinks.k1.type= avro# Develop sink for avroa2.sinks.k1.channel= c1a2.sinks.k1.hostname= hadoop-slave2# specify sink to send data to the destination server name a2.sinks.k1.port= 44444# The destination server's port   hadoop-slave2 is running Agent1,agent1 source for Avro. Flume configuration content is as follows, the file name is Pull.conf#name the component on this agenta1.sources= r1a1.sinks= k1a1.channels= c1  #Describe/configure the sourcea1.sources.r1.type= avroa1.sources.r1.channels= c1a1.sources.r1.bind= hadoop-slave2a1.sources.r1.port= 44444  #Describe the sinka1.sinks.k1.type= loggera1.sinks.k1.channel = c1  #Use a channel which buffers events in memorya1.channels.c1.type= memorya1.channels.c1.keep-alive= 10a1.channels.c1.capacity= 100000a1.channels.c1.transactioncapacity= 100000. Now start the flume in Hadoop-slave2, and then start Flume in Hadoop-master, the order must be correct, otherwise it will report the following error: Org.apache.flume.FlumeException: java.net.SocketException:Unresolved address  start Flume:bin/flume-ng agent-c conf-f conf/pull.conf in Hadoop-slave2 -N a1-dflume.root.logger=info,console start in Hadoop-master flume:bin/flume-ng agent-c conf-f conf/push.conf-n A2- dflume.root.logger=info,console  Reopen a window to connect to Hadoop-mastertelnet via Telnet hadoop-master 44444 then send 11111aaaa in the HADOOP-SLAVE2 console will display the previously sent, 11111AAAA, as follows: 2018-06-14 06:43:00,686 ( Sinkrunner-pollingrunner-defaultsinkprocessor) [info-org.apache.flume.sink.loggersink.procesS (loggersink.java:94)] Event: {headers:{} body:31 0D 11111aaaa.   6, the data is transmitted to Kafka via Flume, and the data is stored in HDFs and hive via Kafka. The first thing to configure is Kafka. For configuration Kafka please refer to:72466504 Start the zookeeper on Hadoop-master, hadoop-slave1, Hadoop-slave2, respectively. The command is: Then start Kafka, enter the Kafka installation directory, execute the command:./bin/kafka-server-start.sh config/server.properties & Creating Kafka in topic:bin/ kafka-topics.sh--create--zookeeper hadoop-master:2181,hadoop-slave1:2181,hadoop-slave2:2181--replication-factor 1--partitions 2--topic flume_kafka  view topic:bin/kafka-topics.sh--list--zookeeper hadoop-master:2181 in Kafka, hadoop-slave1:2181,hadoop-slave2:2181  Start Kafka Consumer:./kafka-console-consumer.sh--zookeeper Hadoop-master : 2181,hadoop-slave1:2181,hadoop-slave2:2181--topic flume_kafka  Configuration flume conf file, set source type to exec, Sink for Org.apache.flume.sink.kafka.KafkaSink, set Kafka topic for the Flume_kafka created above, specifically configured as follows: # Name the components in this agenta1.sources = R1a1.sinks = K1a1.channels = c1 # describe/configure The type of source# set to exec, is the execution of the command. A1.sources.r1.type = exec# Set sources command to execute A1.sources.r1.command = tail-f/home/hadoop/flumehomework/ flumecode/flume_exec_test.txt # Set Kafka receiver A1.sinks.k1.type = Org.apache.flume.sink.kafka.kafkasink# set the broker address and port number of the Kafka a1.sinks.k1.brokerlist=hadoop-master:9092# Sets the way Kafka topica1.sinks.k1.topic=flume_kafka# is serialized A1.sinks.k1.serializer.class=kafka.serializer.stringencoder  # use a channel which buffers events in memorya1.channels.c1.type=memorya1.channels.c1.capacity = 100000a1.channels.c1.transactioncapacity = 1000 # Bind The source and sink to the channela1.sources.r1.channels= c1a1.sinks.k1.channel=c1  start flume: As long as/home/hadoop/flumehomework/flumecode/flume_exec_ When there is data in the Test.txt, Flume will load the Kafka and then be consumed by the Kafka consumers who started it. We see the following data in the Discovery/home/hadoop/flumehomework/flumecode/flume_exec_test.txt file: 131,dry pasta131,dry pasta132,beauty133, Muscles joints pain relief133,muscles joints pain relief133,muscles joints pain relief133,muscles Joints Pain RELIEF134,SP Ecialty Wines Champagnes134,specialty Wines Champagnes134,specialty Wines Champagnes

Flume use summary of data sent to Kafka, HDFs, Hive, HTTP, netcat, etc.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.