First, overview:
In a real-world production environment, you will typically encounter the need to pour logs from web servers such as Tomcat, Apache, etc. into HDFs for analysis. The way to configure this is to achieve the above requirements.
Second, the configuration file:
#agent1 nameagent1.sources=source1agent1.sinks=Sink1agent1.channels=channel1#spooling directory#set Source1agent1.sources.source1.type=Spooldiragent1.sources.source1.spoolDir=/opt/flumetest/Dataagent1.sources.source1.channels=Channel1agent1.sources.source1.fileHeader=falseagent1.sources.source1.interceptors=I1agent1.sources.source1.interceptors.i1.type=Timestamp#set Sink1agent1.sinks.sink1.type=Hdfsagent1.sinks.sink1.hdfs.path=/home/hdfs/flume/Logsagent1.sinks.sink1.hdfs.fileType=DataStreamagent1.sinks.sink1.hdfs.writeFormat=TEXTagent1.sinks.sink1.hdfs.rollInterval=1Agent1.sinks.sink1.channel=Channel1agent1.sinks.sink1.hdfs.filePrefix=%y-%m-%D#set Channel1agent1.channels.channel1.type=fileAgent1.channels.channel1.checkpointDir=/opt/flumetest/CP/Pointagent1.channels.channel1.dataDirs=/opt/flumetest/CP
Third, execute the following command:
Ensure that the folder defined in the above configuration file already exists before executing.
Bin/flume-ng agent-n agent1-c conf-f study/logs2hdfs.conf-dflume.root.logger=debug,console
The list of logs in the source folder is as follows:
Iv. viewing data in HDFs:
You can see that the data files are very small, which is related to the configuration, because the sink configuration in the Rollinterval configuration of the time interval is too small caused. Can be adjusted according to the requirements.
Flume (4) Practical Environment Construction: Source (spooldir) +channel (file) +sink (HDFS) mode