Sqoop
Flume
Hdfs
Sqoop is used to import data from a structured data source, such as an RDBMS
Flume for moving bulk stream data to HDFs
HDFs Distributed File system for storing data using the Hadoop ecosystem
The Sqoop has a connector architecture. The connector knows how to connect to the appropriate data source and get the data
1. Build a example file under flume/conf: Write the following configuration information to the example file#配置agent1表示代理名称agent1. Sources=source1agent1.sinks=Sink1agent1.channels=channel1# Configuration Source1agent1.sources.source1.type=Spooldir Agent1.sources.source1.spoolDir=/usr/bigdata/flume/conf/test/Hmbbs agent1.sources.source1.channels=Channel1agent1.sources.source1.fileHeader=falseagent1.sources.so
-round.
3 Implementing the Architecture
A schema implementation architecture is shown in the following figure:
Analysis of 3.1 producer layer
The service assumptions within the PAAs platform are deployed within the Docker container, so in order to meet the non-functional requirements, another process is responsible for collecting logs and therefore does not invade the service framework and processes. Using flume ng for log collection, this open s
Flume mainly by the following types of monitoring methods:JMX Monitoring
JMX High detonation can modify the JAVA_OPTS environment variables in the flume-env.sh file as follows:
Export java_opts= "-dcom.sun.management.jmxremote-dcom.sun.management.jmxremote.port=5445- Dcom.sun.management.jmxremote.authenticate=false-dcom.sun.management.jmxremote.ssl=false "
Ganglia monitoring
data loss. Try to use tail-F. Note that it is in uppercase;
2. About channel:
1. We recommend that you use the new composite spillablememorychannel for the collection node. We recommend that you use memory channel for the summary node, depending on the actual data volume, generally, memory channel is recommended for Flume agents whose data volume exceeds MB per minute (the file channel processing speed is about 2 m/s, which may vary with machin
How is the built-in monitoring of flume integrated? Many people have asked this question. Currently, you can use the cloudera manager and ganglia graphical monitoring tools to obtain JSON strings from the browser or customize the reports to other monitoring systems. What is the monitoring information? Is the statistical information of each component, such as the number of successfully received events, the number of successfully sent events, and the nu
Flume is a highly available, highly reliable, distributed mass log capture, aggregation, and transmission system provided by Cloudera, Flume supports the customization of various data senders in the log system for data collection, while Flume provides simple processing of data The ability to write to various data-receiving parties (customizable). Currently belong
Apache-flume Restart the script, Apache-flume Restart the regular start of multiple processes, kill not clean, write a restart script.#echo-e parameter output is red, online can search the shell output with color font encoding a lot.Catobi-track_restart.sh#!/bin/bashpid= ' lsof-i:8787|grepjava| awk ' {print$2} ' if[-n ' ${pid} ' ];thenecho-e ' ############\033[31m kill${pid}\033[0m############# "foriin" ${p
Log4j Direct output log to FlumeThis jar is a tool class provided by the CDH release of Cloudera, which can be configured to output log4j logs directly to the flume for easy log acquisition.In the CDH5.3.0 version, it is: Flume-ng-log4jappender-1.5.0-cdh5.3.0-jar-with-dependencies.jarDirectory is:/opt/cloudera/parcels/cdh/lib/flume-ng/tools/Specific Use examplesl
1.Jdkthe installationrefer to the installation of the JDK here. 2.installationZookeeperrefer to my The "Fully distributed" section of the Zookeeper installation tutorial. 3.installationKafkarefer to my The "Fully distributed Build" section of the Kafka installation tutorial. 4.installationFlumerefer to my Flume Installation Tutorial. 5.ConfigurationFlume5.1. ConfigurationKafka-s.cfg$ cd/software/flume/conf/
First, the demand
Use flume to capture the file information under Linux and pass it into the Kafka cluster.
Environment ready Zookeeper cluster and Kafka cluster are installed well.
Second, the configuration flume
Download Flume website. The blogger himself is using flume1.6.0.
Official Address http://flume.apache.org/download.html
1. Overview
Flume is a high-performance, highly possible distributed log collection system for Cloudera company.
The core of Flume is to collect data from the data source and send it to the destination. In order to ensure that the transmission must be successful, before sending to the destination, will first cache the data, waiting for the data to really arrive at the destination, delete their own cached da
Document Location:Http://flume.apache.org/FlumeUserGuide.html#system-requirements
Java Runtime Environment-java 1.8 or later (Java version must be 1.8 or higher)
Memory-sufficient memory for configurations used by sources, channels or sinks (to have enough RAM for channel and source use)
Disk Space-sufficient disk Space for configurations used by channels or sinks (requires enough memory if channel is file type)
Directory permissions-read/write Permissions for directories us
Todo:The sink of Flume is reconstructed, and the consumer producer (producer) of Kafka is called to send the message;Inherit the Irichspout interface in SOTRM's spout, call Kafka's message consumer (Consumer) to receive the message, and then go through several custom bolts to output the custom contentWriting KafkasinkCopy from $kafka_home/libKafka_2.10-0.8.2.1.jarKafka-clients-0.8.2.1.jarScala-library-2.10.4.jarTo $flume_home/libNew project in Eclipse
Structure:Nginx-flume->kafka->flume->kafka (because involved in the cross-room problem, between the two Kafka added a flume, egg pain. )Phenomenon:In the second layer, write Kafka topic and read Kafka topic same, manually set sink topic does not take effectTo open the debug log:SOURCE instantiation:APR 19:24:03,146 INFO [conf-file-poller-0] (org.apache.flume.sour
Flume write HDFs operation in the Hdfseventsink.process method, the path creation is done by BucketpathAnalyze its source code (ref.: http://caiguangguang.blog.51cto.com/1652935/1619539)Can be implemented using%{} variable substitution, only need to get the time field in the event (the Nginx log of the local times) incoming Hdfs.path can beThe specific implementation is as follows:1. In the Kafkasource process method, add:DT = Kafkasourceutil.getdatem
Transferred from: http://www.cnblogs.com/lxf20061900/p/4014281.htmlThe pathname of the HDFs sink in Flume-ng (the corresponding parameter "Hdfs.path", which is not allowed to be empty) and the file prefix (corresponding to the parameter "Hdfs.fileprefix") support the regular parsing timestamp to automatically create the directory and file prefix by time.In practice, it is found that the flume built-in parsi
Flume Architecture and Core components(1)Source 收集 负责从什么地方采集数据(2)Channel 记录 (3)Sink 输出Official documentsHttp://flume.apache.org/FlumeUserGuide.htmlHttp://flume.apache.org/FlumeUserGuide.html#starting-an-agentFlume Use IdeasThe key to using Flume is to write the configuration file
(1) Configuring the source
(2) Configuration Channerl
(3) configuration sink
(4) string The above three comp
Capture Directory to HDFsUsing flume to capture a directory requires an HDFS cluster to be startedVI spool-hdfs.conf# Name the components on Thisagenta1.sources=r1a1.sinks=K1a1.channels=c1# Describe/Configure the source# #注意: You can not repeat the same name in the monitoring target file A1.sources.r1.type=Spooldira1.sources.r1.spoolDir=/root/Logs2a1.sources.r1.fileHeader=true# Describe The Sinka1.sinks.k1.type=Hdfsa1.sinks.k1.channel=C1a1.sinks.k1.hd
Label:Flume is a highly available, highly reliable, distributed mass log collection, aggregation and transmission system. You can look at the model: Each flume agent can provide a flume service. Each agent has three members: source, channel, sink As shown, fetching data from source and sending it to Channel,channel is like a buffer, from which sink reads data from the channel. --------------------------
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.