Reprint please indicate the original source: http://www.cnblogs.com/lighten/p/6830439.html1. IntroductionThis article is mainly to translate the official related documents, the source address click here. Introduce some basic knowledge and construction method of Flume.Apache Flume is a distributed, reliable and usable system for efficient collection, aggregation, and movement of large amounts of log data from many different sources to centralized data
Official Document parameter explanation: Http://flume.apache.org/FlumeUserGuide.html#hdfs-sinkNeed to note: file format, filetype=datastream default is Sequencefile, is the Hadoop file format, to DataStream can be read directly (Sqeuencefile How to use still do not know. )Configuration file:Hdfs.conf
A1.sources = R1A1.sinks = K1A1.channels = C1# Describe/configure The sourceA1.sources.r1.type = SpooldirA1.sources.r1.channels = C1A1.source
Flume acquisition Process:#说明: The case is Flume listening directory/home/hadoop/flume_kafka acquisition to Kafka;Start the clusterStart Kafka,Start the agent,Flume-ng agent-c. -f/home/hadoop/flume-1.7.0/conf/myconf/
Sqoop
Flume
Hdfs
Sqoop is used to import data from a structured data source, such as an RDBMS
Flume for moving bulk stream data to HDFs
HDFs Distributed File system for storing data using the Hadoop ecosystem
The Sqoop has a connector architecture. The connector knows how to connect to the appropriate data source
its inevitability.Since 2012, the word Big Data has been more and more mentioned, and now we have entered the era of big data. In this era of information explosion, the amount of data generated every day is very large. Big data is more than just about data, big Data has four features:large data volume, wide variety, low value density, fast aging high. Based on these characteristics, we need a thing that has the following features:1. Can store large amounts of data2. Can quickly process large am
Real-time streaming processing complete flow based on flume+kafka+spark-streaming
1, environment preparation, four test server
Spark Cluster Three, SPARK1,SPARK2,SPARK3
Kafka cluster Three, SPARK1,SPARK2,SPARK3
Zookeeper cluster three, SPARK1,SPARK2,SPARK3
Log Receive server, SPARK1
Log collection server, Redis (this machine is used to do redis development, now used to do log collection test, the hostname does not change)
Log collection process:
Log
When learning new computer knowledge, the first thing is to write a "Hello World", similarly, in Flume, its "Hello World" is run it. 1, Flume basic outline(1) What does Flume do? Flume is an open source project for Apach that collects data and aggregates data from different nodes into a central node. (2) will data be
The most comprehensive history of hadoop, hadoop
The course mainly involves the technical practices of Hadoop Sqoop, Flume, and Avro.
Target Audience
1. This course is suitable for students who have basic knowledge of java, have a certain understanding of databases and SQL statements, and are skilled in using linux sys
Flume Architecture and Core components(1)Source 收集 负责从什么地方采集数据(2)Channel 记录 (3)Sink 输出Official documentsHttp://flume.apache.org/FlumeUserGuide.htmlHttp://flume.apache.org/FlumeUserGuide.html#starting-an-agentFlume Use IdeasThe key to using Flume is to write the configuration file
(1) Configuring the source
(2) Configuration Channerl
(3) configuration sink
(4) string The above three comp
understand. ========================== About flume source code, in the Flume official web site can download its source package, where to read where, the configuration and use of the help is very large. ============================================= About shell scripts: if[[$?-ne 0]] $? : Refers to the result of the previous script execution, such as the previous script is a basic ls-l command, if the succe
IP implementation.Paste the configuration of the testThe configuration is the same, use the time to open or close sinkgroup comments.This is the configuration of the collection node.#flume配置文件Agent1.sources=execsourceagent1.sinks= Avrosink1 Avrosink2Agent1.channels=filechannel#sink groups affect performance very much#agent1. Sinkgroups=avrogroup#agent1. sinkgroups.avroGroup.sinks = Avrosink1 Avrosink2#sink调度模式 load_balance Failover#agent1. sinkgroups
Contents of this issue:1. Flume on HDFs case review2. Flume push data to spark streaming combat3. Analysis of principle drawing1. Flume on HDFS case ReviewThe last lesson required everyone to install the configuration flume, and test the transmission of data. I was asked to teleport on HDFs yesterday.File configuration
Flume and Sqoop are Hadoop data integration and collection systems, both of which are positioned differently, following an introduction based on individual experience and understanding and everyone:FlumebyClouderadeveloped, there are two major products:Flume-ogand theFlume-ng,Flume-ogThe architecture is too complex, there will be data loss in the inquiring, so gi
Reprint Please specify source: http://www.cnblogs.com/xiaodf/Flume as a Log collection tool, monitoring a file directory or a file, when new data is added, the acquisition of new data sent to the message queue.1 Installing the Deployment flumeTo collect local data from a data node, each node needs to have a flume tool installed to do data collection.1.1 Download and installGo to the official website to down
Liaoliang Teacher's course: The 2016 big Data spark "mushroom cloud" action spark streaming consumption flume collected Kafka data DIRECTF way job.First, the basic backgroundSpark-streaming get Kafka data in two ways receiver and direct way, this article describes the way of direct. The specific process is this:1, direct mode is directly connected to the Kafka node to obtain data.2. Direct-based approach: Periodically query Kafka to obtain the latest
Flume Introduction
Flume is a highly available, highly reliable, and distributed system for massive log collection, aggregation, and transmission provided by cloudera. Flume supports Custom Data senders in the log system, flume is used to collect data. Flume also provides t
Example 1: Type Avro, create a avro.conf for testing in the Conf of Flume, as follows:A1.sources = R1A1.sinks = K1A1.channels = C1
# Describe/configure The sourceA1.sources.r1.type = AvroA1.sources.r1.channels = C1A1.sources.r1.bind = 0.0.0.0A1.sources.r1.port = 44444
# Describe The sinkA1.sinks.k1.type = Logger
# Use a channel which buffers events in memoryA1.channels.c1.type = Memorya1.channels.c1.capacity = 1000a1.channels.c1.transactionCapacity =
Use Apache Flume to read JMS Message Queuing messages. and write the message to the Hdfs,flume agent configuration such as the following:Flume-agent.conf#name the components in this agentagenthdfs.sources = Jms_sourceAgenthdfs.sinks = Hdfs_sinkAgenthdfs.channels = Mem_channel# Describe/configure The sourceAgentHdfs.sources.jms_source.type = JMS# Bind to all interfacesAgentHdfs.sources.jms_source.initialCont
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.