First, Introduction
Recently in the study of Big data analysis related work, for which the use of the collection part used to Flume, deliberately spent a little time to understand the flume work principle and working mechanism. A personal understanding of a new system first, after a rough understanding of its rationale, and then from the source code to understand some of its key implementation part, and finally try to modify some of the content, so as to deepen its understanding. About flume principle part of the relevant information online a lot, here to introduce my source analysis environment of the building process.
Ii. Introduction of the Environment
1, APACHE-FLUME-1.6.0-SRC
2. CentOS 7.0
3. Eclipse Java EE Kepler
4, jdk-6u45-linux-x64.rpm
5. Source Insight
6. Oracle Virtual Box
third, the analysis method
Install the JDK in CentOS, turn on the remote debugging function of Eclipse, and perform the tracking and dispatch analysis. Compared with direct read source directly with source insight efficiency is more efficient, but in the analysis process can use the Source Insight Auxiliary analysis class between the reference relationship.
Iv. Analysis Steps
1, installation of JDK and Flume in CentOS, the installation process is no longer more described. In this process CentOS and flume are installed in virtual box with IP address 192.168.1.11.
2, set flume startup parameters, use Notepad to open apache-flume-1.6.0-src\bin\flume-ng and edit, mainly the following items
Java_opts= "-xmx20m"
Revision changed to
Java_opts= "-xmx20m-xdebug-xrunjdwp:transport=dt_socket,address=8888,server=y,suspend=y"
This sets the port for remote debugging to 8888. Remote debugging related parameter settings see here
3. Set the remote debugging capabilities of Eclipse, as shown in the following figure:
4. Use Eclipse Java EE Kepler to import an existing Maven project, the flume source code, as shown in the following figure:
5. java files in eclipse ... \apache-flume-1.6.0-src\flume-ng-node\src\main\java\org\apache\flume\node\ Set breakpoints in Application.java and start Flume in CentOS to remotely debug the flume startup process. Thus, the whole flume source code is tracked and analyzed as an entry point.
Note: to start flume, then use Eclipse for debugging, otherwise you will not be able to connect.
Five, frequently asked questions
With the introduction of the MAVEN project from Eclipse, there are a number of errors, and some common workarounds are as follows:
1, the most common is due to the wall, maven.twttr.com and some libraries in Google can not download, try several, the best solution see here, that is, in the Pom file to add the following content
<repository>
<id>maven.tempo-db.com</id>
<url>http://maven.oschina.net/service /local/repositories/sonatype-public-grid/content/</url>
</repository>
2, for version issues see here.
3, Tools.jar problem see here. See here for
4, Plugin execution not covered by lifecycle configuration problems.
5, a similar error occurred avroflumeogevent cannot be resolved, as shown in the following figure, the corresponding class could not be found,
This is because Avro is used and the Pom file needs to be Generate-source , use the Avro-maven-plugin plugin to generate the appropriate Java file, and then add the corresponding library address to resolve the above problem.
vi. precautions
1, because the compatibility between each version of the JDK is not good, in order to better read the source code must understand the JDK version used by each software, such as the JDK required by eclipse, Flume required JDK and so on, otherwise there will be some strange problems, difficult to solve.
2, Flume in the construction process to use a lot of open-source mature systems, such as Avro, Netty, Maven and so on, and Kafka and so on, there are also intersection, so in the analysis process need to understand the relevant open source system content.