Hadoop source code learning notes (1)
-- Find the main function and read the configure class
In the first quarter, we briefly studied what hadoop is and how to use it. Under the temptation of this open-source project, we will study how it is implemented.
I have been making a statement in advance. net, a little unfamiliar with Java, so when learning this work, it will insert the learning of Java from time to time, and it will also come up, including the design pattern and so on. Thank you for your attention.
Throughout the learning process, we mainly use eclipse to learn how to build a debugging environment in eclipse.
In the previous source code, we have found several main function portals. So here we will list a plan:
- Fsshell main portal: org. Apache. hadoop. fs. fsshell
- Namenode main entry: org. Apache. hadoop. HDFS. server. namenode. namenode
- Datanode main entry: org. Apache. hadoop. HDFS. server. datanode. datanode
- Jobtracker main entry: org. Apache. hadoop. mapred. jobtracker
- Tasktracker main entry: org. Apache. hadoop. mapred. tasktracker
We will study it in this order. For other projects such as secondenamenode, we will study it later.
Similarly, we will take a look at this content. The first step is to look at DFS, and the second step is to look at mapreduce.
Before studying DFS, let's take a look at the relationship between the three:
Among them, namenode is the main interface of the client and the only pair of contacts. It is also mainly responsible for file name Directory management and data datanode ing.
Okay. Let's take a look at it. Let's start the program first.
In eclipse, we can easily find the corresponding main function of each module, but it is still inconvenient. for debugging convenience, we create three new entry classes:
The self-built portal class is mainly used for convenience, and the code in the three classes is:
Fsshellenter. Java
- Import org. Apache. hadoop. fs. fsshell;
-
- Public class fsshellenter {
-
- Public static void main (string [] ARGs) throws exception {
- Fsshell. Main (New String [] {"-ls "});
- }
- }
Namnodeenter. Java
- Public class namenodeenter {
-
- Public static void main (string [] ARGs) throws exception {
- Org. Apache. hadoop. HDFS. server. namenode. namenode. Main (ARGs );
- }
- }
Datanodeenter. Java
- Public class datanodeenter {
-
- Public static void main (string [] ARGs ){
- Org. Apache. hadoop. HDFS. server. datanode. datanode. Main (ARGs );
- }
- }
Running:
Start the command line and run: $ bin/hadoop namenode
In eclipse, open fsshellenter. Java and click run. You can see:
In eclipse, open namnodeenter. Java and click Run,
In the console, you can enter a bunch of information, indicating that it is normal.
Open the command line and enter $ bin/hadoop FS-ls. You can see:
In this way, both the positive and negative operations are acceptable.
Of course, there is no file content operation involved here, so there is no problem with datanode, but you can try it on your own.
Open these main functions, and you can see that the configuration class is made at the beginning. So let's take a look at what this class looks like:
Let's take a look at how we used this class:
- Configuration conf = new configuration ();
- String name = Conf. Get ("fs. Default. Name ");
- System. Out. println (name );
From the literal meaning and this function, we can see that the configuration class is used to read the configuration file, and the program reads the value of FS. Default. name in the configuration file.
Observe its constructor:
- Public configuration (){
- This (true );
- }
- Public configuration (Boolean loaddefaults ){
- This. loaddefaults = loaddefaults;
- If (log. isdebugenabled ()){
- Log. debug (stringutils. stringifyexception (New ioexception ("Config ()")));
- }
- Synchronized (configuration. Class ){
- Registry. Put (this, null );
- }
- }
It is found that there is no operation in the current, mainly set a loaddefaults value to true.
Then observe the get function:
- Public String get (string name ){
- Return substitutevars (getprops (). getproperty (name ));
- }
- Private synchronized properties getprops (){
- If (properties = NULL ){
- Properties = new properties ();
- Loadresources (properties, resources, quietmode );
- If (overlay! = NULL)
- Properties. putall (overlay );
- }
- Return properties;
- }
The get function first calls the substitutevars function, which is a regular expression processing function that can process invalid characters in the returned value. Then, in the getprops function, it judges the properties of the hashtable type, if it is null, the system creates and performs initial activation. Otherwise, the system returns the result directly. Getproperty then takes the value based on its key value.
Obviously, the lazy loading method is used here, that is, the data in the configuration file is not loaded at first, but is loaded only when access is required.
Let's take a further look at how the initial implementation of the loadresources function:
- Private void loadresources (properties,
- Arraylist resources,
- Boolean quiet ){
- If (loaddefaults ){
- For (string resource: defaultresources ){
- Loadresource (properties, resource, quiet );
- }
-
- // Support the hadoop-site.xml as a deprecated case
- If (getresource ("hadoop-site.xml ")! = NULL ){
- Loadresource (properties, "hadoop-site.xml", quiet );
- }
- }
-
- For (Object Resource: Resources ){
- Loadresource (properties, resource, quiet );
- }
- }
The second line shows that the resources in defaultresources are loaded first, and then the hadoop-site.xml (row 5th) is loaded ).
Defaultresources:
- Static {
- ...
- Adddefaultresource ("core-default.xml ");
- Adddefaultresource ("core-site.xml ");
- }
- Public static synchronized void adddefaultresource (string name ){...}
From here, by default, the core-default.xml and core-site.xml files are loaded.
Here, we can open the three XML files to see:
From the two files, we can see that the key-value pair is stored in the configuration file, and a description is added to the description of the configuration item. Therefore, you can obtain the value after reading the key in the program.
At the same time, core-site.xml is our own configuration file, a closer look, you can find that there are some of the same configuration items in the core-defalut.xml. Load defalut before site when loading, and overwrite the former when the latter has the same key.
In other words, if hadoop. tmp. DIR is not configured, the default value is/tmp .... Directory.
At the same time, it can be similar to other hadoop configurations, you can refer to the core-default.xml in. You can directly modify it, or copy it in the core-site file.
Continue to observe the configuration methods:
It is found that there are many get functions, and then different types are returned. This makes it easy for us to directly process the value.
At the same time, we can see a bunch of set functions. These set functions are modified and not saved. Therefore, these functions are also visible, and configuration can work without a configuration file.
Hadoop source code learning notes (1) -- starting from the second season -- finding the main function and reading the configure class