1.flume concept
Flume is a distributed, reliable, highly available system for efficient collection, aggregation, and movement of large amounts of log data from different sources, and centralized data storage.
Flume is currently a top-level project for Apache.
Flume need Java running environment, require java1.6 above, recommended java1.7.
Unzip the downloaded Flume installation package to the specified directory.
Important models in 2.flume 2.1.1.flume Event:
The flume event, defined as a byte data stream with a valid payload and an optional set of string attributes.
2.1.2.flume Agent:
The flume agent is a process that hosts a process that flows from an external source event to the next destination. Contains the source channel and sink.
2.1.3.Source
The data source consumes externally passed events to him, and the external source sends the data to Flume source in a format that flume source can recognize flume events.
2.1.4.Channel
The data channel is a passive store that is used to hold events until consumed by a flume sink.
2.1.5.Sink
Data aggregation points, which represent the location of external data storage. Send Flume event to the specified external target
2.2. Flume Flow Model
2.3. Flume features 2.3.1. Complex mobility
Flume allows users to stream to their final destination in multiple levels, and also allows fan-out flows (one to many), fan inflow (more than one), failover, and failure handling.
2.3.2. Reliability
Transactional data transfer ensures the reliability of the data.
2.3.3. recoverability
Channels can be implemented as memory or file, memory is faster, but unrecoverable, while files are slow but provide recoverability.
Introductory case
1. First write a configuration file:
#example.conf: Single-node flume configuration # Component of the name agent A1 a1.sources = R1a1.sinks = K1a1.channels = c1# Description/ Configuration Sourcea1.sources.r1.type = netcata1.sources.r1.bind = 0.0.0.0
A1.sources.r1.port = 44444# description Sinka1.sinks.k1.type = logger# Description Memory Channela1.channels.c1.type = memorya1.channels.c1.capacity = a1.channels.c1.transactionCapacity = 100# bind source and Sinka1.sources.r1.channels = C1a1.sinks.k1.channel = C1 for Channle
2. Start the agent with the Flume tool
$ bin/flume-ng Agent--conf conf--conf-file example.conf--name A1-dflume.root.logger=info,console
3. Send data
Send data to the 44444 port of the machine on which Flume is connected via telnet command in Windows.
4. Source Detailed
Here are some of the more important source
4.1. Avro Source
Listens on the Avro port to accept event streams from external Avro clients. Using Avro source can realize multi-level flow, fan-out flow, fan inflow and other effects. You can also accept the log information sent by the Avro client provided by Flume.
4.1.1. Avro Source Property Description
!channels–
!type– type name, "AVRO"
!bind– the host name or IP that needs to be monitored
!port– ports to listen on
threads– worker threads maximum number of threads
Selector.type
Selector.*
interceptors– space-delimited list of interceptors
interceptors.*
Compression-type None compression type, can be "none" or "Default", this value must match the compression format of Avrosource
Sslfalse whether SSL encryption is enabled, and if enabled, you need to configure a "KeyStore" and a "Keystore-password".
keystore– the path where the Java key file is provided for SSL.
keystore-password– the Java key file password provided for SSL.
The Keystore-typejks keystore type can be either "JKS" or "PKCS12".
A exclude-protocolssslv3 space-delimited list that is used to specify exclusions in the SSL/TLS protocol. SSLv3 will always be excluded except for the protocol specified.
IpFilter False if you need to turn on IP filtering for Netty, set this to True
ipfilterrules– Define an IP filter setting expression rule for Netty
Case:
Write the configuration file modify the configuration file given above, except the source section configuration, the rest is the same. The different places are as follows:
# Description/Configuration Source
A1.sources.r1.type = avroa1.sources.r1.bind = 0.0.0.0a1.sources.r1.port = 44444
Start Flume:
./flume-ng Agent--conf. /conf--conf-file. /conf/template2.conf--name A1-dflume.root.logger=info,console
Send log information to the specified machine via the Avro client provided by Flume:
./flume-ng avro-client--conf.. /conf--host 0.0.0.0--port 44444--filename. /mydata/log1.txt
Will find that the logs are actually collected
4.2. Spooling Directory Source
This source allows you to place the data you want to collect in the "auto-collect" directory. This source will monitor the directory and will parse the appearance of the new file. The event processing logic is pluggable, and when a file is fully read into the channel, it is renamed or optionally deleted directly.
Note that the files placed in the automatic collection directory can not be modified, if modified, then flume will error. In addition, can not produce duplicate name of the file, if the name of the file is placed in, then Flume will error.
Property Description: (due to the long length here only the properties that must be given, all properties please refer to the official documentation):
!channels–
!type– type, need to be specified as "Spooldir"
!spooldir– read the path to the file, which is the "Collect directory"
filesuffix.completed append suffix to processing completed files
Case:
Write the configuration file modify the configuration file given above, except the source section configuration, the rest is the same. The different places are as follows:
# Description/configuration Sourcea1.sources.r1.type = spooldira1.sources.r1.spooldir=/home/park/work/apache-flume-1.6.0-bin/ MyData
Start Flume:
./flume-ng Agent--conf. /conf--conf-file. /conf/template4.conf--name A1-dflume.root.logger=info,console
Transfers files to the specified directory, discovers that Flume collects the file, and processes each line in the file as a log
4.3. NetCat Source
A netcat source is used to listen on a specified port and convert each row of the received data into an event.
4.3.1. NetCat Source Property Description
! channels–
! type– type name, need to be set to "Netcat"
! bind– Specifies the IP or host name to bind to.
! port– Specifies the port number to bind to
Max-line-length 512 single-line maximum number of bytes
Case: The complete example above is
4.4. HTTP Source
HTTP source accepts HTTP GET and post requests as Flume events, where the Get method should be used only for experimentation.
The source needs to provide a pluggable "processor" to convert the request to an event object, which must implement the Httpsourcehandler interface, which accepts a HttpServletRequest object and returns a flume A collection of Envent objects.
Events obtained from an HTTP request are committed to the channel in a transaction. This allows the channel to be more efficient as a file channel.
If the processor throws an exception, source will return a 400 HTTP status code.
If the channel is full and the event can no longer be added to the channel, source returns an HTTP status code of 503 indicating that it is temporarily unavailable.
4.4.1. HTTP Source Property Description
! Type, must be "HTTP"
! port– Listening ports
The host name or IP of the bind 0.0.0.0 listener
Handler Org.apache.flume.source.http.JSONHandler processor class, need to implement Httpsourcehandler interface
Configuration parameters for the handler.*– processor
Selector.type
Selector.*
interceptors–
interceptors.*
Enablessl False if SSL is turned on, if necessary set to true. Note that HTTP does not support SSLV3.
Excludeprotocols SSLv3 A space-delimited SSL/TLS protocol to exclude. SSLv3 are always excluded.
KeyStore where the KeyStore file is located.
Keystorepassword Keystore Key Vault Password
Case:
Write the configuration file modify the configuration file given above, except the source section configuration, the rest is the same. The different places are as follows:
# Description/configuration Sourcea1.sources.r1.type = Httpa1.sources.r1.port = 66666
Start Flume:
./flume-ng Agent--conf. /conf--conf-file. /conf/template6.conf--name A1-dflume.root.logger=info,console
Send an HTTP request to the specified port by command:
Curl-x post-d ' [{"headers": {"a": "A1", "B": "B1"}, "Body": "hello~http~flume~"}] ' http://0.0.0.0:6666
The flume--of Big data series several different sources