Document directory
- Format namenode
- Solution 1:
- Solution 2:
View Original
Note: Switch the version from 0.21.0 to 0.20.205.0 or vice versa. There is no way to use the built-in upgrade command (many operations in this article are best written as scripts, which is too troublesome to manually operate)
Please indicate the source for reprinting. Thank you. It is really tiring to implement it.
Before testing
The test uses three machines as the test:
Namenode/secondarynamenode: 192.168.1.39 slave039 (this node connects to the Internet 114.212.190.92 ).
Datanode: 192.168.1.33 slave033
192.168.1.34 slave034
In addition to the root user, the cluster user group has three users under the hadoop user group: hadoop, user001, user002, and user003. The password of user001-user003 is the account name. Upload some files under the corresponding account and make some simple mapreduce applications.
Format namenode
- We know that metadata files stored in the original namenode will be regenerated.
- If you execute the namenode-format command, the content in hadoop_dir in namenode will be changed, but the content in hadoop_dir in datanode will not be changed, that is, if I have backed up the fsimage file in hadop_dir/dfs/data/current in namenode, I can retrieve my file data.
Set of replacement plans
Solutions required:
- First, create a folder tmp205 in the folder hadoop_dir/hadoop_d. Do not use the original TMP folder, which may cause some column troubles. In fact, you do not need to save the folder for version replacement.
Solution 1:
Save the hadoop_dir folder and replace it with the hadoop_install folder again. Note that the format command cannot be replaced and the corresponding data cannot be found.
Modify the edits, version, and other files in DFS/name/current in namenode to identify the metadata, that is to say, we can know what data exists in the original version and the size of the data, but we are frustrated that this solution is theoretically feasible, but it was not completed because there are many files to be replaced, therefore, the final result is that you can only see which data exists, but cannot read the data.
However, this solution should be feasible!
Solution 2:
This solution creates a hadoop_d folder on each node for hadoop namenode-format, and then copies a file hadoop_dir/dfs/data/current/fsimage from the original hadoop_dir folder.
Note that this is the case in the configuration of this solution. The datanode data files still exist in hadoop_dir, but the log and PIDs files exist in the new folder hadoop_d.
Here I will give a file layout for each version:
0.20.205.0
Hadoop. tmp. DIR/home/hadoop/hadoop_dir/tmp205
Hadoop_log_dir/home/hadoop/hadoop_d/log
Hadoop_pid_dir/home/hadoop/hadoop_d/PIDs
DFS. Name. DIR/home/hadoop/hadoop_d/dfs/Name
DFS. Data. DIR/home/hadoop/hadoop_dir/dfs/data,/data/hadoop_dir/dfs/Data
Mapred. Local. DIR/home/hadoop/hadoop_dir/mapred/local,/data/hadoop_dir/mapred/local
Mapred. system. DIR/home/hadoop/hadoop_dir/mapred/System
0.21.0
Hadoop. tmp. DIR/home/hadoop/hadoop_dir/tmp21
Hadoop_log_dir/home/hadoop/hadoop_dir/log
Hadoop_pid_dir/home/hadoop/hadoop_dir/PIDs
DFS. namenode. Name. DIR/home/hadoop/hadoop_dir/dfs/Name
DFS. datanode. Data. DIR/home/hadoop/hadoop_dir/dfs/data,/data/hadoop_dir/dfs/Data
Mapreduce. Cluster. Local. DIR/home/hadoop/hadoop_dir/mapred/local,/data/hadoop_dir/mapred/local
Mapred. jobtracker. system. DIR/home/hadoop/hadoop_dir/mapred/System
Replacement Process
- 1. Back up the fsimage file!
- Add new folder
Mkdir ~ /Hadoop_d
Mkdir DFS; mkdir log; mkdir mapred; mkdir tmp205; mkdir tmp21;
- Configuration
Tar-zxvf hadoop-0.20.205.0.tar.gz (place this package in the directory of hadoop_install) to the new folder hadoop_install
Modify the configuration file (The backup corresponding to the configuration file already exists.)
Manual modification required:
Hadoop-env.sh:
Export java'home =/usr/lib/JVM/Java-1.6.0
Export hadoop_log_dir =/home/hadoop/hadoop_d/log
Export hadoop_pid_dir =/home/hadoop_d/PIDs
MASTERS:
192.168.1.39
Slaves:
192.168.1.33
192.168.1.34
Core-site.xml
<? XML version = "1.0"?>
<? XML-stylesheet type = "text/XSL" href = "configuration. XSL"?>
<! -- Put site-specific property overrides in this file. -->
<Configuration>
<Property>
<Name> hadoop. tmp. dir </Name>
<! -- <Value>/tmp/hadoop-$ {user. name} </value> -->
<Value>/home/hadoop/hadoop_dir/tmp205 </value>
<Description> a base for other temporary directories. </description>
</Property>
<Property>
<Name> fs. Default. Name </Name>
<! -- <Value> file: // </value> -->
<Value> HDFS: // 192.168.1.39: 54310 </value>
<Description> the name of the default file system. a uri whose
Scheme and authority determine the filesystem implementation.
Uri's scheme determines the config property (FS. scheme. impl) Naming
The filesystem implementation class. The Uri's authority is used
Determine the host, port, etc. For a filesystem. </description>
</Property>
</Configuration>
Hdfs-site.xml
<? XML version = "1.0"?>
<? XML-stylesheet type = "text/XSL" href = "configuration. XSL"?>
<! -- Put site-specific property overrides in this file. -->
<Configuration>
<Property>
<Name> DFS. Name. dir </Name>
<Value>/home/hadoop/hadoop_d/dfs/name </value>
<Description> determines where on the local filesystem the DFS Name Node
Shocould store the name table (fsimage). If this is a comma-delimited list
Of directories then the name table is replicated in all of
Directories, for redundancy. </description>
</Property>
<Property>
<Name> DFS. Data. dir </Name>
<Value>/home/hadoop/hadoop_dir/dfs/data,/data/hadoop_dir/dfs/Data </value>
<Description> determines where on the local filesystem an DFS Data Node
Shocould store its blocks. If this is a comma-delimited
List of directories, then data will be stored in all named
Directories, typically on different devices.
Directories that do not exist are ignored.
</Description>
</Property>
<Property>
<Name> DFS. datanode. Data. dir. Perm </Name>
<Value> 755 </value>
</Property>
<Property>
<Name> DFS. Replication </Name>
<Value> 3 </value>
<Description> default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</Description>
</Property>
</Configuration>
Mapred-site.xml
<? XML version = "1.0"?>
<? XML-stylesheet type = "text/XSL" href = "configuration. XSL"?>
<! -- Put site-specific property overrides in this file. -->
<Configuration>
<Property>
<Name> mapred. Job. Tracker </Name>
<Value> HDFS: // 192.168.1.39: 54311 </value>
<Description> the host and port that the mapreduce job tracker runs
At. If "local", then jobs are run in-process as a single map
And reduce task.
</Description>
</Property>
<Property>
<Name> mapred. Local. dir </Name>
<Value>/home/hadoop/hadoop_dir/mapred/local,/data/hadoop_dir/mapred/local </value>
<Description> the local directory where mapreduce stores intermediate
Data Files. May be a comma-separated list
Directories on different devices in order to spread disk I/O.
Directories that do not exist are ignored.
!! Warning: there is a problem to set the value as this for all the nodes, on the jobtracker
(Here Is secondarymaster (192.168.1.38), the later dir "/data/hadoop_dir/mapred/local" shocould not
Be placed here)
</Description>
</Property>
<Property>
<Name> mapred. system. dir </Name>
<Value>/home/hadoop/hadoop_dir/mapred/System </value>
<Description> the directory where mapreduce stores control files.
</Description>
</Property>
<Property>
<Name> mapreduce. jobtracker. Staging. Root. dir </Name>
<! -- <Value >$ {hadoop. tmp. dir}/mapred/staging </value> -->
<Value>/user/$ {user. name}/mapred/staging </value>
<Description> the root of the staging area for users 'job files
In practice, this shocould be the directory where users 'home
Directories are located (usually/user)
</Description>
</Property>
<Property>
<Name> mapred. tasktracker. Map. Tasks. Maximum </Name>
<Value> 6 </value>
<Description> the maximum number of map tasks that will be run
Simultaneously by a task tracker.
</Description>
</Property>
<Property>
<Name> mapred. tasktracker. Reduce. Tasks. Maximum </Name>
<Value> 4 </value>
<Description> the maximum number of reduce tasks that will be run
Simultaneously by a task tracker.
</Description>
</Property>
<Property>
<Name> mapred. Child. java. opts </Name>
<Value>-xmx4048m </value>
<Description> JAVA opts for the task tracker child processes.
The following symbol, if present, will be interpolated: @ taskid @ is replaced
By current taskid. Any other occurrences of '@ 'will go unchanged.
For example, to enable verbose GC logging to a file named for the taskid in
/Tmp and to set the heap maximum to be a gigabyte, pass a 'value':
-Xmx1024m-verbose: GC-xloggc:/tmp/@ taskid @. GC
The configuration variable mapred. Child. ulimit can be used to control
Maximum virtual memory of the Child processes.
</Description>
</Property>
<Property>
<Name> DFS. Hosts. Exclude </Name>
<Value>/home/hadoop/hadoop_dir/slaves. Exclude </value>
</Property>
<Property>
<Name> mapred. Hosts. Exclude </Name>
<Value>/home/hadoop/hadoop_dir/slaves. Exclude </value>
</Property>
</Configuration>
- Modify the. bash_profile file, that is, modify the link
Export java'home =/usr/lib/JVM/Java-1.6.0
Export hadoop_home =/home/hadoop/hadoop_instils/hadoop-0.21.0
Export Path = $ path: $ java_home/bin: $ hadoop_home/bin
Change
Export java'home =/usr/lib/JVM/Java-1.6.0
Export hadoop_home =/home/hadoop/hadoop_install/hadoop-0.20.205.0
Export Path = $ path: $ java_home/bin: $ hadoop_home/bin
Update source. bash_profile
Copy the configured file to the corresponding user directory and datanode:
CP. bash_profile/home/user001
CP. bash_profile/home/user002
CP. bash_profile/home/user003
SCP. bash_profile hadoop@192.168.1.33:/home/hadoop
SCP. bash_profile hadoop@192.168.1.34:/home/hadoop
SCP-r/home/hadoop/hadoop_install
Hadoop@192.168.1.33:/home/hadoop
SCP-r/home/hadoop/hadoop_install
Hadoop@192.168.1.34:/home/hadoop
- Exit the current user and log on again. After Stop-all.sh, execute hadoop namenode-format
- Replace the hadoop_d/dfs/name/current/fsimage file in namenode (overwrite from backup attachment), modify version =-32 in hadoop_d/dfs/name/current/fsimage/version, record the namespaceids value as X, and change hadoop_dir/dfs/data/current/version in datanode to-32 in/datahadoop_dir/dfs/data/current/version, change namespaceids to X.
Note:
If you want to test the open path, you can use
Strace-O output.txt-Fe open start-dfs.sh