First, pre-preparation: Linux command base Scala, Python one of Hadoop, Spark, Flume, Kafka, HBase basic knowledge Second, distributed log Collection framework Flume business status Analysis: Server, Web services generated by a large number of logs, how to use , how to import a large number of logs into the cluster 1, Shell script batch, and then to HDFs: not high efficiency, low fault tolerance, network/disk IO, monitoring 2, Flume:flume: The key is write profile 1) configuration Agent2) configuration Source3) configuration CHANNEL4) configuration sink1-netcat-mem-logger.conf: Listening port data
#example for Source=netcat, Channel=memory, sink=logger# Name the "components" on this agenta1.sources = R1a1.channels = C1A 1.sinks = k1# Configure for sourcesa1.sources.r1.type = Netcata1.sources.r1.bind = Localhosta1.sources.r1.port = 44444# Co nfigure for channelsa1.channels.c1.type = memorya1.channels.c1.capacity = 1000a1.channels.c1.transactioncapacity = 100 # configure for sinksa1.sinks.k1.type = logger# Configure A1.sinks.k1.channel = C1a1.sources.r1.channels = C1
Start Flume-ng agent \-n A1 \-c conf-f./1-netcat-mem-logger.conf \-dflume.root.logger=info,console exec-mem-logger.conf: Supervisor Control files
# Name The agenta1.sources = R1a1.channels = C1a1.sinks = k1# Configure for sourcesa1.sources.r1.type = Execa1.sources.r1.command = tail-f/opt/datas/flume_data/exec_tail.log# Configure for channelsa1.channels.c1.type = memorya1.channels.c1.capacity = 1000a1.channels.c1.transactioncapacity = 100# Configure for sinksa1.sinks.k1.type = Loggera1.sinks.k1.channel = C1a1.sources.r1.channels = C1
Flume-ng Agent \-n A1 \-c conf-f./4-exec-mem-logger.conf \-dflume.root.logger=info,console
Log collection process: 1. Log server, start Agent,exec-source, Memory-channel,avro-sink (data server), will collect the log data, write to Data server 2. Data server, starting Agent,avro-aource,memory-channel,logger-sink/kafka-sink
Conf1:exec-mem-avro.conf
# Name The agenta1.sources = Exec-sourcea1.channels = Memory-channela1.sinks = avro-sink# Configure for Sourcesa1.sources.exec-source.type = Execa1.sources.exec-source.command = Tail-f/opt/datas/log-collect-system/log_ server.log# Configure for channelsa1.channels.memory-channel.type = Memorya1.channels.memory-channel.capacity = 1000a1.channels.memory-channel.transactioncapacity = 100# Configure for sinksa1.sinks.avro-sink.type = Avroa1.sinks.avro-sink.hostname = Localhosta1.sinks.avro-sink.port = 44444# Configure A1.sinks.avro-sink.channel = Memory-channela1.sources.exec-source.channels = Memory-channel
Conf2:avro-mem-logger.conf
# Name The components in this agenta1.sources= avro-Sourcea1.channels= memory-channela1.sinks= logger-sink# Configure forSourcesa1.sources.avro-source.type =Avroa1.sources.avro-source.bind =Localhosta1.sources.avro-source.port =44444# Configure forchannelsa1.channels.memory-channel.type =memorya1.channels.memory-channel.capacity = +a1.channels.memory-channel.transactioncapacity = -# Configure forSinksa1.sinks.logger-sink.type =logger# Configure A1.sinks.logger-sink.channel = memory-Channela1.sources.avro-source.channels = Memory-channel
(Very IMPORTANT!!!) ) boot order: Start exec-mem-avro.conf before starting exec-mem-avro.conf
Visualization of Flume+kafka+sparkstreaming+hbase+ (I.)