Todo:The sink of Flume is reconstructed, and the consumer producer (producer) of Kafka is called to send the message;Inherit the Irichspout interface in SOTRM's spout, call Kafka's message consumer (Consumer) to receive the message, and then go through several custom bolts to output the custom contentWriting KafkasinkCopy from $kafka_home/libKafka_2.10-0.8.2.1.jarKafka-clients-0.8.2.1.jarScala-library-2.10.
Structure:Nginx-flume->kafka->flume->kafka (because involved in the cross-room problem, between the two Kafka added a flume, egg pain. )Phenomenon:In the second layer, write Kafka topic
Background: Kafka The completion of the message bus, so that the data of each system can be aggregated in the Kafka node, the next task is to maximize the value of data, let the data "Hui" talk.Environment Preparation:Kafka server.CDH 5.8.3 Server, install Flume,solr,hue,hdfs,zookeeper service.Flume provides a scalable, real-time data transmission channel, Morphl
The data source used in the previous article is to take data from a socket, a bit belonging to the "Heterodoxy", serious is from the Kafka and other message queue to take the data!The main supported source, learned by the official website are as follows: The form of data acquisition includes push push and pull pullsfirst, spark streaming integration Flume The way of 1.pushMore recommended is the pull meth
1. Background information
Many of the company's platforms generate a large number of logs per day (typically streaming data, such as search engine PV, queries, etc.), and processing these logs requires a specific logging system, in general, these systems need to have the following characteristics:
(1) Construct the bridge of application system and analysis system, and decouple the correlation between them;
(2) Support near real-time online analysis system and similar to the offline analysis sys
1. Background information
Many of the company's platforms generate a large number of logs per day (typically streaming data, such as search engine PV, queries, etc.), and processing these logs requires a specific logging system, in general, these systems need to have the following characteristics:
(1) Construct the bridge of application system and analysis system, and decouple the correlation between them;
(2) Support near real-time online analysis system and similar to the offline analysis syst
-flume-1.5.2-bin/tracklog-kafka/checkpointAgent.channels.m1.datadirs=/opt/modules/apache-flume-1.5.2-bin/tracklog-kafka/datadirAgent.channels.m1.transactionCapacity = 1000000agent.channels.m1.capacity=1000000Agent.channels.m1.checkpointInterval = 30000
Second, the data into the KafkaThe above collect topic need
Flume real-time crawl log data and upload to Kafka
1.Linux OK zookeeper is configured, start zookeeper first
sbin/zkserver.sh start
(sbin/zkserver.sh Status View startup state) JPS can check to see Le process as Quorumpeermain
2. Start Kafka,zookeeper need to start before Kafka
bin/
Real-time streaming processing complete flow based on flume+kafka+spark-streaming
1, environment preparation, four test server
Spark Cluster Three, SPARK1,SPARK2,SPARK3
Kafka cluster Three, SPARK1,SPARK2,SPARK3
Zookeeper cluster three, SPARK1,SPARK2,SPARK3
Log Receive server, SPARK1
Log collection server, Redis (this machine is used to do redis development, now
descriptions of English, you can seeEnvironment IntroductionCentOS 7.3 jdk1.8 cdh5.14.01. Package the project with MVN and generate two jar packages2. Because I use the CDH method to install the integrated flume, so throw these two jars under the/usr/libIf this is a normal installation, you need to copy these two jar packages to Lib under the Flume installation directory.3. Go to the CDH Management page co
Flume is a highly available, highly reliable, distributed mass log capture, aggregation, and transmission system provided by Cloudera, Flume supports the customization of various data senders in the log system for data collection, while Flume provides simple processing of data The ability to write to various data-receiving parties (customizable).
Using
Tomcat Production LogFlume crawl log sinking into Kafka
Package a well-written Web project into a war package, Eclise Direct export Export,idea Add a new Artifact-achieve entry in artifact, select the directory where the Web project resides, and build
In the Linux Tomcat WebApp directory into the war package, when you start Tomcat under the bin, the war package will be automatically decompressed, and then accessed from the browser, note t
. The Kafka source guarantees a message retrieval policy at least once. Duplicates can exist when the source is started.Kafka Source also provides Key.deserializer (Org.apache.kafka.common.serialization.StringSerializer) and Value.deserializer ( Default value for Org.apache.kafka.common.serialization.ByteArraySerializer). Modifying these parameters is not recommended.# # #启用的属性Examples of topic lists with a well split subscription
Tier1.sources.source
Flume acquisition Process:#说明: The case is Flume listening directory/home/hadoop/flume_kafka acquisition to Kafka;Start the clusterStart Kafka,Start the agent,Flume-ng agent-c. -f/home/hadoop/flume-1.7.0/conf/myconf/
Speaking of headings, this is only a small part of the real-time architecture.
Download the latest version flume:apache-flume-1.6.0-bin.tar.gz
Unzip, modify Conf/flume-conf.properties name can write casually.
What I currently achieve is to read the data from the directory to write to the Kafka, the principle of the east of the Internet a lot of, only to connect t
First, pre-preparation: Linux command base Scala, Python one of Hadoop, Spark, Flume, Kafka, HBase basic knowledge Second, distributed log Collection framework Flume business status Analysis: Server, Web services generated by a large number of logs, how to use , how to import a large number of logs into the cluster 1, Shell script batch, and then to HDFs: not hig
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.