http://flume.apache.org/install 1, upload 2, unzip 3, modify the JDK directory in the conf/flume-env.sh file Note: java_opts configuration If we transfer the file too large reported memory overflow need to modify this configuration item 4, Verify that the installation was successful./flume-ng VERSION5, configuring environment variables export Flume_home=/home/apache-flume-1.6.0-binsource, Channel, sink what types flume Sourcesource Type | Description Avro Source | Support for Avro Protocol (actually Avro RPC) with built-in support for thrift Source | Support for thrift protocol, built-in support for exec Source | UNIX-based command produces data on standard output JMS Source | Read data from a JMS system (message, subject) spooling Directory Source | Monitor data changes in a specified directory Twitter 1% firehose source| continuously downloads Twitter data via API, Test nature Netcat Source | Monitors a port and takes each text line of data flowing through the port as an event input sequence Generator Source | Sequence generator data source, production sequence data syslog Sources | Read syslog data, generate event, support UDP and TCP two protocol HTTP Source | Data source based on HTTP POST or get mode, support JSON, BLOB representation Legacy Sources | Compatible with old Flume og source (0.9.x version) Flume channelchannel type description Memory Channel | Event data is stored in memory jdbc Channel | Event data is stored in persistent storage, current flume channel built-in Support Derbyfile Channel | Event data is stored in a disk fileSpillable Memory Channel | Event data is stored in memory and on disk and is persisted to disk files when the memory queue is full pseudo Transaction Channel | Test Purpose Custom Channel | Custom Channel Implementation Flume sinksink type description HDFs Sink | Data Write Hdfslogger Sink | Data written to log file Avro Sink | The data is converted to Avro Event and then sent to the configured RPC Port Thrift Sink | The data is converted to thrift Event and then sent to the configured RPC port on IRC Sink | Data is played back on IRC file Roll Sink | Store data to local file system null Sink | Discard to all data hbase Sink | Data written to hbase database Morphline SOLR Sink | Data sent to SOLR Search server (cluster) ElasticSearch Sink | Data sent to elastic Search server (cluster) Kite DataSet Sink | Write data to Kite dataset, test-nature of custom Sink | Custom sink Implementation Case 1, A simple examplehttp://flume.apache.org/flumeuserguide.html#a-simple-example configuration file ################### ########################################## Name The components in this agenta1.sources = R1a1.sinks = K1a1.channels = c1# Describe/configure the Sourcea1.sources.r1.type = Netcata1.sources.r1.bind = Localhosta1.sources.r1.port = 44444# Describe the Sinka1.sinks.k1.type = logger# Use a channel which buffers events in Memorya1.channels.c1.type = Memorya1.channels.c1.capacity = 1000a1.channels.c1.transactioncapacity = 100# Bind The source and sink to the Channela1.sources.r1.channels = C1A1.SINKS.K 1.channel = c1########################################################### #启动flumeflume-ng agent-n a1-c conf-f Simple.conf-dflume.root.logger=info,console Installing Telnetyum Install telnet out of CTRL +] Quitmemory Chanel configuration Capacity: By default, the maximum number of event numbers that can be stored in this channel is trasactioncapacity: each time the maximum number of event numbers that can be received in source or sent to sink is 100 Keep-alive:event added to the channel or moved out of the allowed time byte**: That is, the limit of the byte amount of the event, including only Eventbody case 2, two flume do the cluster node01 server, configuration file ################# ############################################ Name The components in this agenta1.sources = R1a1.sinks = K1a1.channels = C1 # describe/configure The Sourcea1.sources.r1.type = Netcata1.sources.r1.bind = Node1a1.sources.r1.port = 44444# Describe The sink# a1.sinks.k1.type = Loggera1.sinks.k1.type = Avroa1.sinks.k1.hostname = Node2a1.sinks.k1.port = 60000# Use a chan Nel WHich buffers Events in memorya1.channels.c1.type = Memorya1.channels.c1.capacity = 1000a1.channels.c1.transactioncapacity = 100# Bind The source and sink to the Channela1.sources.r1.channels = C1A1.SINKS.K 1.channel = c1########################################################### #node02服务器中, install flume (step slightly) profile ########### ################################################## Name The components in this agenta1.sources = R1a1.sinks = K1a1.channels = c1# describe/configure the Sourcea1.sources.r1.type = Avroa1.sources.r1.bind = Node2a1.sources.r1.port = 60000# Describe The Sinka1.sinks.k1.type = logger# use a channel which buffers events in Memorya1.channels.c1.type = Memor ya1.channels.c1.capacity = 1000a1.channels.c1.transactioncapacity = 100# Bind The source and sink to the channela1.sources . r1.channels = C1a1.sinks.k1.channel = c1############################################################ First start the node02 flumeflume-ng agent-n a1-c conf-f avro.conf-dflume.root.logger=info,console and start node01 FLUMEFLUme-ng agent-n a1-c conf-f simple.conf2-dflume.root.logger=info,console open telnet Test node02 console output case 3, Exec sourcehttp: Flume.apache.org/flumeuserguide.html#exec-source configuration file ######################################################## # # # #a1. Sources = R1a1.sinks = K1a1.channels = c1# describe/configure the Sourcea1.sources.r1.type = execa1.sources.r1.com Mand = tail-f/home/flume.exec.log# Describe The Sinka1.sinks.k1.type = logger# use a channel which buffers events in mem Orya1.channels.c1.type = memorya1.channels.c1.capacity = 1000a1.channels.c1.transactioncapacity = 100# Bind the source and sink to the Channela1.sources.r1.channels = C1a1.sinks.k1.channel = c1############################################ ############### #启动Flumeflume-ng agent-n a1-c conf-f exec.conf-dflume.root.logger=info,console Creating an empty file Demo touch Flume.exec.log loop to add data for I in {1..50}; Do echo "$i hi Flume" >> flume.exec.log; Sleep 0.1; Done Case 4, spooling Directory SOURCEHTTP://FLUME.APACHE.ORG/FLUMEUSERGUIDE.HTML#SPOOling-directory-source configuration file ########################################################### #a1. sources = R1a1.sinks = K1a1.channels = c1# describe/configure the Sourcea1.sources.r1.type = Spooldira1.sources.r1.spoolDir =/home/ Logsa1.sources.r1.fileHeader = true# Describe The Sinka1.sinks.k1.type = logger# use a channel which buffers events in mem Orya1.channels.c1.type = memorya1.channels.c1.capacity = 1000a1.channels.c1.transactioncapacity = 100# Bind the source and sink to the Channela1.sources.r1.channels = C1a1.sinks.k1.channel = c1############################################ ############### #启动Flumeflume-ng agent-n a1-c conf-f spool.conf-dflume.root.logger=info,console Copy file demo mkdir LOGSCP Flume.exec.log logs/Case 5, HDFs sinkhttp://flume.apache.org/flumeuserguide.html#hdfs-sink configuration file ###################### ##################################### #a1. Sources = R1a1.sinks = K1a1.channels = c1# describe/configure the Sourcea1.sources.r1.type = Spooldira1.sources.r1.spoolDir =/home/logsa1.sources.r1.Fileheader = true# Describe The sink*** only modifies the previous spool sink configuration code block A1.sinks.k1.type = loggera1.sinks.k1.type= hdfsa1.sinks.k1.hdfs.path=hdfs://bjsxt/flume/%y-%m-%d/%h%m# #每隔60s或者文件大小超过10M的时候产生新文件 # How many messages are in HDFs when you create a new file, 0 not based on the number of messages a1.sinks.k1.hdfs.rollcount=0# HDFs creates a new file, 0 does not create a new file based on the time a1.sinks.k1.hdfs.rollinterval=60# HDFs, 0 not based on file size a1.sinks.k1.hdfs.rollsize=10240# No data is written when the temporary file is currently opened in the time specified by the parameter (in seconds). The temporary file is closed and renamed to the target file a1.sinks.k1.hdfs.idletimeout=3a1.sinks.k1.hdfs.filetype= datastreama1.sinks.k1.hdfs.uselocaltimestamp=true## generates a directory every five minutes: # Whether to enable "discard" on time, "discard" here, similar to "rounding", followed by introduction. If enabled, it affects the "discard" value of all other time expressions in addition to%t a1.sinks.k1.hdfs.round=true# time, and the "discard" units in a1.sinks.k1.hdfs.roundvalue=5# time, Contains: second,minute,houra1.sinks.k1.hdfs.roundunit=minute# use a channel which buffers events in Memorya1.channels.c1.type = memorya1.channels.c1.capacity = 1000a1.channels.c1.transactioncapacity = 100# Bind the Source and sink to the Channela1.sources.r1.channels = C1a1.sinks.k1.channel = c1###################################### ##################### #创建HDFS目录hadoop fs-mkdir/flume start flumeflume-ng agent-n a1-c conf-f hdfs.conf-dflume.root.logger=i Nfo,console View HDFs file Hadoop fs-ls/flume/...hadoop fs-get/flume/... Job: 1, Flume how to collect Java request data using RPC implementation 2, how to do in the project? Log storage/log/directory with YYYYMMDD as sub-directory for each day of data
Flume Courseware (even)