The collection of user behavior data is undoubtedly a prerequisite for building a referral system, and the Flume project under the Apache Foundation is tailored for distributed log collection, this is the 1th of the Flume research note, which mainly introduces Flume's basic architecture, The next note will illustrate the deployment and use steps of flume with an
Flume Official document translation--flume 1.7.0 User Guide (unreleased version) (i)Flume Official document translation--flume 1.7.0 User Guide (Unreleased version) (ii)Flume Properties
Property Name
Default
Description
Flume.call
guarantees the reliability and security of the data transmission.Iii. installing Hadoop and flume My experiment was performed on HDP 2.5.0, and Flume was included in the HDP installation as long as the flume service was configured. Installation steps for HDP see "HAWQ Techn
need to configure the students, you can refer to the configuration of high-availability Hadoop platform.3.2 Installation and configuration
Installation
First, we unzip the flume installation package and the command looks like this:tar -zxvf apache-flume-1.5. 2
agent. This is how single-hop message delivery semantics in Flume provides end-to-end reliability of the stream.Flume uses transactional approach to ensure the reliable delivery of events. Source and sink encapsulate storage/retrieval in a transaction, placing or providing events by the transaction provided by the channel. This ensures that the set of events can be reliably passed from point to place in the process. In the case of the multi hop proce
collector is the receiving agent sent data, the data sent to the specified target machine.
Note: The flume framework's reliance on Hadoop and zookeeper is only on the jar package and does not require that the Hadoop and zookeeper services be started when the flume is started. III. Flume Distributed Environment Deployment 1. Experimental scenario Operating syste
smallest independent operating unit of the agent. An agent is a JVM. The single agent consists of three components: source, sink, and channel.Two-start flume cluster1 ' first, start the Hadoop cluster (see the previous blog for details).2 ' second, (all the rest of the steps need to be done on master ) to install and configure the Flume task, which reads as follows:Unzip the
Label:Original: http://mp.weixin.qq.com/s?__biz=MjM5NzAyNTE0Ng==mid=205526269idx=1sn= 6300502dad3e41a36f9bde8e0ba2284dkey= C468684b929d2be22eb8e183b6f92c75565b8179a9a179662ceb350cf82755209a424771bbc05810db9b7203a62c7a26ascene=0 uin=mjk1odmyntyymg%3d%3ddevicetype=imac+macbookpro9%2c2+osx+osx+10.10.3+build (14D136) version= 11000003pass_ticket=hkr%2bxkpfbrbviwepmb7sozvfydm5cihu8hwlvne78ykusyhcq65xpav9e1w48ts1 Although I have always disapproved of the full use of open source software as a system,
[Flume] uses Flume to pass the Web log to HDFs example:Create the directory where log is stored on HDFs:$ HDFs dfs-mkdir-p/test001/weblogsflumeSpecify the log input directory:$ sudo mkdir-p/flume/weblogsmiddleSettings allow log to be accessed by any user:$ sudo chmod a+w-r/flume$To set the configuration file contents:$
example. conf In the conf directory:
# example.conf: A single-node Flume configuration# Name the components on this agenta1.sources = r1a1.sinks = k1a1.channels = c1# Describe/configure the sourcea1.sources.r1.type = org.apache.flume.clients.log4jappender.Log4jAppendera1.sources.r1.bind = localhosta1.sources.r1.port = 44444# Describe the sinka1.sinks.k1.type = logger# Use a channel which buffers events in memorya1.channels.c1.type = memorya1.channels
the basic unit of Flume data transfer* * Flume send data from source to destination as an event* * The envent consists of an optional header and a byte array containing the data, which is opaque to flume, the header holds a collection of Key-value key-value pairs, the key is unique within a set, Headers can also be extended for use in context routing.Flume Trans
Is Flume a good fit for your problem?If you need to ingest textual log data into Hadoop/hdfs then Flume are the right fit for your problem, full stop. For other use cases, here is some guidelines:Flume is designed to transport and ingestregularly-generatedeventdataoverrelativelystable,potentiallycomplextopologies. Thenotionof "Eventdata" isverybroadlydefined.to flume
First install the Flume:It is recommended to maintain a unified user with Hadoop to install Hadoop,flumeThis time I use Hadoop user installation flumehttp://douya.blog.51cto.com/6173221/1860390To start the configuration:1, configuration file Writing:Vim flume_hdfs.conf# Define A memory channel called CH1 on Agent1Agent1.channels.ch1.type = MemoryAgent1.channels.ch1.capacity = 10000agent1.channels.ch1.transactionCapacity = 100#agent1. channels.ch1.keep
specified place-for example, HDFs, Note: The channel will only delete the temporary data after the sink has successfully sent the data in the channel, which guarantees the reliability and security of the data transmission.generalized usage of flumeFlume is so magical-the reason is that Flume can support multi-level flume agent, that is, Flume can be successive,
when the data actually arrives at the destination. Flume uses transactional methods to ensure the reliability of the entire process of transmitting an event . The sink must be sent to the next agent after the event has been deposited, or after it has been deposited into an external data destination before the event can be removed from the channel. This ensures that the event in the data flow, whether in a single agent or between multiple agents, is
when the data actually arrives at the destination. Flume uses transactional methods to ensure the reliability of the entire process of transmitting an event . The sink must be sent to the next agent after the event has been deposited, or after it has been deposited into an external data destination before the event can be removed from the channel. This ensures that the event in the data flow, whether in a single agent or between multiple agents, is
1, source is HTTP mode, sink is logger mode, the data is printed in the console. The conf configuration file is as follows: # Name The components in this agenta1.sources = R1a1.sinks = K1a1.channels = c1# Describe/configure the S Ourcea1.sources.r1.type = http #该设置表示接收通过http方式发送过来的数据a1. sources.r1.bind = hadoop-master # The host or IP address running flume can be a1.sources.r1.port = 9000# Port #a1.sources.r1.fileheader = true# Describe the Sinka1.sin
There are many examples of failover on the Internet, but there are multiple approaches, and individuals feel that the principle of single responsibility1, a machine running a flume agent2, a agent downstream sink point to a flume agent, do not have a flume agent configuration multiple Ports "impact performance"3, sub-machine configuration, you can avoid a driver,
1.flume conceptFlume is a distributed, reliable, highly available system for efficient collection, aggregation, and movement of large amounts of log data from different sources, and centralized data storage.Flume is currently a top-level project for Apache.Flume need Java running environment, require java1.6 above, recommended java1.7.Unzip the downloaded Flume installa
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.