Flume:Flume data source and output mode:Flume provides 2 modes from console (console), RPC (THRIFT-RPC), text (file), tail (UNIX tail), syslog (syslog log system, TCP and UDP support), EXEC (command execution) The ability to collect data on a data source is currently used by exec in our system for log capture.Flume data recipients, which can be console (console), text (file), DFS (HDFs file), RPC (THRIFT-RPC), and syslogtcp (TCP syslog log system), a
of various data senders in the log system and collects data, while Flume provides simple processing of data and writes to various data recipients (customizable) capabilities. typical architecture for flume:flume data source and output mode:Flume provides 2 modes from console (console), RPC (THRIFT-RPC), text (file), tail (UNIX tail), syslog (syslog log system, TCP and UDP support), EXEC (command execution) The ability to collect data on a data source is currently used by exec in our system for
), tail (UNIX tail), syslog (syslog log System, Support 2 modes such as TCP and UDP, exec (command execution) and other data sources on the ability to collect data, in our system is currently using the Exec method of log capture.Flume data recipients, which can be console (console), text (file), DFS (HDFs file), RPC (THRIFT-RPC), and syslogtcp (TCP syslog log system), and so on. It is received by Kafka in o
(console), RPC (THRIFT-RPC), text (file), tail (UNIX tail), syslog (syslog log System, Support 2 modes such as TCP and UDP, exec (command execution) and other data sources on the ability to collect data, in our system is currently using the Exec method of log capture.Flume data recipients, which can be console (console), text (file), DFS (HDFs file), RPC (THRIFT-RPC), and syslogtcp (TCP syslog log system), and so on. It is received by
=flume_kafka# is serialized A1.sinks.k1.serializer.class=kafka.serializer.stringencoder # use a channel which buffers events in memorya1.channels.c1.type=memorya1.channels.c1.capacity = 100000a1.channels.c1.transactioncapacity = 1000# Bind The source and sink to the channela1.sources.r1.channels= c1a1.sinks.k1.channel=c1 start flume: As long as/home/hadoop/flumehomework/flumecode/flume_exec_ When there is data in the Test.txt, Flume will load the Kafka
=syncProducer.sinks.r.custom.encoding=utf-8Producer.sinks.r.custom.topic.name=test#Specify the channel the sink should useProducer.sinks.r.channel = C# each channel ' s type is defined.Producer.channels.c.type = Memoryproducer.channels.c.capacity = 1000producer.channels.c.transactioncapacity=100#producer. Channels.c.type=file#producer. Channels.c.checkpointdir=/home/checkdir#producer. Channels.c.datadirs=/home/datadirValidating Flume and Kafka combina
:2181 ' #kafka的zk集群地址 group_id=> ' HDFs ' #消费者组, not the same as the consumers on Elk topic_id=> ' apiappwebcms-topic ' #topic consumer_id=> ' logstash-consumer-10.10.8.8 ' #消费者id, custom, I write machine IP. consumer_threads=>1queue_size=> 200codec=> ' JSON ' }}output{ #如果你一个topic中会有好几种日志 can be extracted and stored separately on HDFs. if[type]== "Aping
data, or deploy Kafka and execute the command.
SB--from-beginning
Parameters–ZOOKEEPER Specifies the address and port of the zookeeper in your cluster.–topic to match the name we specified when we push in B.
The above method is only for the shell command line, how to write consumer through spark.Assuming you've downloaded the spark1.0 source, assume you've deployed an environment like SBT Scala.
The Scala code is as follows:
Package test Import jav
is currently used by exec in our system for log capture.Flume data recipients, which can be console (console), text (file), DFS (HDFs file), RPC (THRIFT-RPC), and syslogtcp (TCP syslog log system), and so on. It is received by Kafka in our system.Flume Download and Documentation: Http://flume.apache.org/Flume installation:$tar zxvf apache-flume-1.4. 0-bin.tar.gzFlume Start command:$bin/flume-ng agent--conf
Flume is a highly available, highly reliable, distributed mass log capture, aggregation, and transmission system provided by Cloudera, Flume supports the customization of various data senders in the log system for data collection, while Flume provides simple processing of data The ability to write to various data-receiving parties (customizable).
Using flume from Kafka data to HDFs
The configuration file
26 Preliminary use of clusterDesign ideas of HDFsL Design IdeasDivide and Conquer: Large files, large batches of files, distributed on a large number of servers, so as to facilitate the use of divide-and-conquer method of massive data analysis;L role in Big Data systems:For a variety of distributed computing framework (such as: Mapreduce,spark,tez, ... ) Provides data storage servicesL Key Concepts: File Cut, copy storage, meta data26.1 HDFs Use1. Vie
Original link: Kafka combat-flume to KAFKA1. OverviewIn front of you to introduce the entire Kafka project development process, today to share Kafka how to get the data source, that is, Kafka production data. Here are the directories to share today:
Data sources
Flume to
some disadvantages, in order to ensure the normal leader election, it can tolerate the follower number of fail is relatively small. If you want to tolerate 1 follower hanging off, must have more than 3 Replica, if you want to tolerate 2 follower hanging off, must have more than 5 Replica. In other words, in order to guarantee the high degree of fault tolerance in the production environment, there must be a lot of replica, and a large number of replica will lead to a sharp decline in performance
Refer to the message system, currently the hottest Kafka, the company also intends to use Kafka for the unified collection of business logs, here combined with their own practice to share the specific configuration and use. Kafka version 0.10.0.1
Update record 2016.08.15: Introduction to First draft
As a suite of large data for cloud computing,
Kafka instead of log aggregation ). Log aggregation generally collects log files from the server and stores them in a centralized location (File Server or HDFS) for processing. However, Kafka ignores the file details and abstracts them into a log or event message stream. This reduces the processing latency of Kafka an
Kafka instead of log aggregation ). Log aggregation generally collects log files from the server and stores them in a centralized location (File Server or HDFS) for processing. However, Kafka ignores the file details and abstracts them into a log or event message stream. This reduces the processing latency of Kafka an
data and convert data into a structured log. stored in the data store (can be database or HDFS, etc.).
4. LinkedIn's Kafka
Kafka is the December 2010 Open source project, using Scala language, the use of a variety of efficiency optimization mechanisms, the overall architecture is relatively novel (push/pull), more suitable for heterogeneous clusters.
Design obje
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.