The flume--of Big data series several different sources

Source: Internet
Author: User

1.flume concept

Flume is a distributed, reliable, highly available system for efficient collection, aggregation, and movement of large amounts of log data from different sources, and centralized data storage.

Flume is currently a top-level project for Apache.

Flume need Java running environment, require java1.6 above, recommended java1.7.

Unzip the downloaded Flume installation package to the specified directory.

Important models in 2.flume 2.1.1.flume Event:

The flume event, defined as a byte data stream with a valid payload and an optional set of string attributes.

2.1.2.flume Agent:

The flume agent is a process that hosts a process that flows from an external source event to the next destination. Contains the source channel and sink.

2.1.3.Source

The data source consumes externally passed events to him, and the external source sends the data to Flume source in a format that flume source can recognize flume events.

2.1.4.Channel

The data channel is a passive store that is used to hold events until consumed by a flume sink.

2.1.5.Sink

Data aggregation points, which represent the location of external data storage. Send Flume event to the specified external target

2.2. Flume Flow Model

2.3. Flume features 2.3.1. Complex mobility

Flume allows users to stream to their final destination in multiple levels, and also allows fan-out flows (one to many), fan inflow (more than one), failover, and failure handling.

2.3.2. Reliability

Transactional data transfer ensures the reliability of the data.

2.3.3. recoverability

Channels can be implemented as memory or file, memory is faster, but unrecoverable, while files are slow but provide recoverability.

Introductory case

1. First write a configuration file:

#example.conf: Single-node flume configuration # Component of the name agent A1 a1.sources = R1a1.sinks = K1a1.channels =  c1# Description/ Configuration Sourcea1.sources.r1.type  =  netcata1.sources.r1.bind  =  0.0.0.0
A1.sources.r1.port = 44444# description Sinka1.sinks.k1.type = logger# Description Memory Channela1.channels.c1.type = memorya1.channels.c1.capacity = a1.channels.c1.transactionCapacity = 100# bind source and Sinka1.sources.r1.channels = C1a1.sinks.k1.channel = C1 for Channle

2. Start the agent with the Flume tool

$ bin/flume-ng Agent--conf conf--conf-file example.conf--name A1-dflume.root.logger=info,console

3. Send data

Send data to the 44444 port of the machine on which Flume is connected via telnet command in Windows.

4. Source Detailed

Here are some of the more important source

4.1. Avro Source

Listens on the Avro port to accept event streams from external Avro clients. Using Avro source can realize multi-level flow, fan-out flow, fan inflow and other effects. You can also accept the log information sent by the Avro client provided by Flume.

4.1.1. Avro Source Property Description

!channels–

!type– type name, "AVRO"

!bind– the host name or IP that needs to be monitored

!port– ports to listen on

threads– worker threads maximum number of threads

Selector.type

Selector.*

interceptors– space-delimited list of interceptors

interceptors.*

Compression-type None compression type, can be "none" or "Default", this value must match the compression format of Avrosource

Sslfalse whether SSL encryption is enabled, and if enabled, you need to configure a "KeyStore" and a "Keystore-password".

keystore– the path where the Java key file is provided for SSL.

keystore-password– the Java key file password provided for SSL.

The Keystore-typejks keystore type can be either "JKS" or "PKCS12".

A exclude-protocolssslv3 space-delimited list that is used to specify exclusions in the SSL/TLS protocol. SSLv3 will always be excluded except for the protocol specified.

IpFilter False if you need to turn on IP filtering for Netty, set this to True

ipfilterrules– Define an IP filter setting expression rule for Netty

Case:

Write the configuration file modify the configuration file given above, except the source section configuration, the rest is the same. The different places are as follows:

# Description/Configuration Source
A1.sources.r1.type = avroa1.sources.r1.bind = 0.0.0.0a1.sources.r1.port = 44444

Start Flume:

./flume-ng Agent--conf. /conf--conf-file. /conf/template2.conf--name A1-dflume.root.logger=info,console

Send log information to the specified machine via the Avro client provided by Flume:

./flume-ng avro-client--conf.. /conf--host 0.0.0.0--port 44444--filename. /mydata/log1.txt

Will find that the logs are actually collected

4.2. Spooling Directory Source

This source allows you to place the data you want to collect in the "auto-collect" directory. This source will monitor the directory and will parse the appearance of the new file. The event processing logic is pluggable, and when a file is fully read into the channel, it is renamed or optionally deleted directly.

Note that the files placed in the automatic collection directory can not be modified, if modified, then flume will error. In addition, can not produce duplicate name of the file, if the name of the file is placed in, then Flume will error.

Property Description: (due to the long length here only the properties that must be given, all properties please refer to the official documentation):

!channels–

!type– type, need to be specified as "Spooldir"

!spooldir– read the path to the file, which is the "Collect directory"

filesuffix.completed append suffix to processing completed files

Case:

Write the configuration file modify the configuration file given above, except the source section configuration, the rest is the same. The different places are as follows:

# Description/configuration Sourcea1.sources.r1.type  = spooldira1.sources.r1.spooldir=/home/park/work/apache-flume-1.6.0-bin/ MyData

Start Flume:

./flume-ng Agent--conf. /conf--conf-file. /conf/template4.conf--name A1-dflume.root.logger=info,console

Transfers files to the specified directory, discovers that Flume collects the file, and processes each line in the file as a log

4.3. NetCat Source

A netcat source is used to listen on a specified port and convert each row of the received data into an event.

4.3.1. NetCat Source Property Description

! channels–

! type– type name, need to be set to "Netcat"

! bind– Specifies the IP or host name to bind to.

! port– Specifies the port number to bind to

Max-line-length 512 single-line maximum number of bytes

Case: The complete example above is

4.4. HTTP Source

HTTP source accepts HTTP GET and post requests as Flume events, where the Get method should be used only for experimentation.

The source needs to provide a pluggable "processor" to convert the request to an event object, which must implement the Httpsourcehandler interface, which accepts a HttpServletRequest object and returns a flume A collection of Envent objects.

Events obtained from an HTTP request are committed to the channel in a transaction. This allows the channel to be more efficient as a file channel.

If the processor throws an exception, source will return a 400 HTTP status code.

If the channel is full and the event can no longer be added to the channel, source returns an HTTP status code of 503 indicating that it is temporarily unavailable.

4.4.1. HTTP Source Property Description

! Type, must be "HTTP"

! port– Listening ports

The host name or IP of the bind 0.0.0.0 listener

Handler Org.apache.flume.source.http.JSONHandler processor class, need to implement Httpsourcehandler interface

Configuration parameters for the handler.*– processor

Selector.type

Selector.*

interceptors–

interceptors.*

Enablessl False if SSL is turned on, if necessary set to true. Note that HTTP does not support SSLV3.

Excludeprotocols SSLv3 A space-delimited SSL/TLS protocol to exclude. SSLv3 are always excluded.

KeyStore where the KeyStore file is located.

Keystorepassword Keystore Key Vault Password

Case:

Write the configuration file modify the configuration file given above, except the source section configuration, the rest is the same. The different places are as follows:

# Description/configuration Sourcea1.sources.r1.type  = Httpa1.sources.r1.port  = 66666

Start Flume:

./flume-ng Agent--conf. /conf--conf-file. /conf/template6.conf--name A1-dflume.root.logger=info,console

Send an HTTP request to the specified port by command:

Curl-x post-d ' [{"headers": {"a": "A1", "B": "B1"}, "Body": "hello~http~flume~"}] ' http://0.0.0.0:6666

The flume--of Big data series several different sources

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.