How to save data and logs in hadoop cluster version Switching

Last Update:2018-12-04 Source: Internet

Author: User

Tags tmp folder xsl

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Document directory

Format namenode
Solution 1:
Solution 2:

View Original

Note: Switch the version from 0.21.0 to 0.20.205.0 or vice versa. There is no way to use the built-in upgrade command (many operations in this article are best written as scripts, which is too troublesome to manually operate)

Please indicate the source for reprinting. Thank you. It is really tiring to implement it.

Before testing

The test uses three machines as the test:

Namenode/secondarynamenode: 192.168.1.39 slave039 (this node connects to the Internet 114.212.190.92 ).

Datanode: 192.168.1.33 slave033

192.168.1.34 slave034

In addition to the root user, the cluster user group has three users under the hadoop user group: hadoop, user001, user002, and user003. The password of user001-user003 is the account name. Upload some files under the corresponding account and make some simple mapreduce applications.

Format namenode

We know that metadata files stored in the original namenode will be regenerated.
If you execute the namenode-format command, the content in hadoop_dir in namenode will be changed, but the content in hadoop_dir in datanode will not be changed, that is, if I have backed up the fsimage file in hadop_dir/dfs/data/current in namenode, I can retrieve my file data.

Set of replacement plans

Solutions required:

First, create a folder tmp205 in the folder hadoop_dir/hadoop_d. Do not use the original TMP folder, which may cause some column troubles. In fact, you do not need to save the folder for version replacement.

Solution 1:

Save the hadoop_dir folder and replace it with the hadoop_install folder again. Note that the format command cannot be replaced and the corresponding data cannot be found.

Modify the edits, version, and other files in DFS/name/current in namenode to identify the metadata, that is to say, we can know what data exists in the original version and the size of the data, but we are frustrated that this solution is theoretically feasible, but it was not completed because there are many files to be replaced, therefore, the final result is that you can only see which data exists, but cannot read the data.

However, this solution should be feasible!

Solution 2:

This solution creates a hadoop_d folder on each node for hadoop namenode-format, and then copies a file hadoop_dir/dfs/data/current/fsimage from the original hadoop_dir folder.

Note that this is the case in the configuration of this solution. The datanode data files still exist in hadoop_dir, but the log and PIDs files exist in the new folder hadoop_d.

Here I will give a file layout for each version:

0.20.205.0

Hadoop. tmp. DIR/home/hadoop/hadoop_dir/tmp205

Hadoop_log_dir/home/hadoop/hadoop_d/log

Hadoop_pid_dir/home/hadoop/hadoop_d/PIDs

DFS. Name. DIR/home/hadoop/hadoop_d/dfs/Name

DFS. Data. DIR/home/hadoop/hadoop_dir/dfs/data,/data/hadoop_dir/dfs/Data

Mapred. Local. DIR/home/hadoop/hadoop_dir/mapred/local,/data/hadoop_dir/mapred/local

Mapred. system. DIR/home/hadoop/hadoop_dir/mapred/System

0.21.0

Hadoop. tmp. DIR/home/hadoop/hadoop_dir/tmp21

Hadoop_log_dir/home/hadoop/hadoop_dir/log

Hadoop_pid_dir/home/hadoop/hadoop_dir/PIDs

DFS. namenode. Name. DIR/home/hadoop/hadoop_dir/dfs/Name

DFS. datanode. Data. DIR/home/hadoop/hadoop_dir/dfs/data,/data/hadoop_dir/dfs/Data

Mapreduce. Cluster. Local. DIR/home/hadoop/hadoop_dir/mapred/local,/data/hadoop_dir/mapred/local

Mapred. jobtracker. system. DIR/home/hadoop/hadoop_dir/mapred/System

Replacement Process

1. Back up the fsimage file!
Add new folder

Mkdir ~ /Hadoop_d

Mkdir DFS; mkdir log; mkdir mapred; mkdir tmp205; mkdir tmp21;

Configuration

Tar-zxvf hadoop-0.20.205.0.tar.gz (place this package in the directory of hadoop_install) to the new folder hadoop_install

Modify the configuration file (The backup corresponding to the configuration file already exists.)

Manual modification required:

Hadoop-env.sh:

Export java'home =/usr/lib/JVM/Java-1.6.0

Export hadoop_log_dir =/home/hadoop/hadoop_d/log

Export hadoop_pid_dir =/home/hadoop_d/PIDs

MASTERS:

192.168.1.39

Slaves:

192.168.1.33

192.168.1.34

Core-site.xml

<? XML version = "1.0"?>

<? XML-stylesheet type = "text/XSL" href = "configuration. XSL"?>

<! -- Put site-specific property overrides in this file. -->

<Name> hadoop. tmp. dir </Name>

<! -- <Value>/tmp/hadoop-$ {user. name} </value> -->

<Value>/home/hadoop/hadoop_dir/tmp205 </value>

<Description> a base for other temporary directories. </description>

</Property>

<Name> fs. Default. Name </Name>

<! -- <Value> file: // </value> -->

<Description> the name of the default file system. a uri whose

Scheme and authority determine the filesystem implementation.

Uri's scheme determines the config property (FS. scheme. impl) Naming

The filesystem implementation class. The Uri's authority is used

Determine the host, port, etc. For a filesystem. </description>

</Property>

</Configuration>

Hdfs-site.xml

<? XML version = "1.0"?>

<? XML-stylesheet type = "text/XSL" href = "configuration. XSL"?>

<! -- Put site-specific property overrides in this file. -->

<Value>/home/hadoop/hadoop_d/dfs/name </value>

<Description> determines where on the local filesystem the DFS Name Node

Shocould store the name table (fsimage). If this is a comma-delimited list

Of directories then the name table is replicated in all of

Directories, for redundancy. </description>

</Property>

<Value>/home/hadoop/hadoop_dir/dfs/data,/data/hadoop_dir/dfs/Data </value>

<Description> determines where on the local filesystem an DFS Data Node

Shocould store its blocks. If this is a comma-delimited

List of directories, then data will be stored in all named

Directories, typically on different devices.

Directories that do not exist are ignored.

</Description>

</Property>

<Name> DFS. datanode. Data. dir. Perm </Name>

</Property>

<Name> DFS. Replication </Name>

<Description> default block replication.

The actual number of replications can be specified when the file is created.

The default is used if replication is not specified in create time.

</Description>

</Property>

</Configuration>

Mapred-site.xml

<? XML version = "1.0"?>

<? XML-stylesheet type = "text/XSL" href = "configuration. XSL"?>

<! -- Put site-specific property overrides in this file. -->

<Name> mapred. Job. Tracker </Name>

<Description> the host and port that the mapreduce job tracker runs

At. If "local", then jobs are run in-process as a single map

And reduce task.

</Description>

</Property>

<Name> mapred. Local. dir </Name>

<Value>/home/hadoop/hadoop_dir/mapred/local,/data/hadoop_dir/mapred/local </value>

<Description> the local directory where mapreduce stores intermediate

Data Files. May be a comma-separated list

Directories on different devices in order to spread disk I/O.

Directories that do not exist are ignored.

!! Warning: there is a problem to set the value as this for all the nodes, on the jobtracker

(Here Is secondarymaster (192.168.1.38), the later dir "/data/hadoop_dir/mapred/local" shocould not

Be placed here)

</Description>

</Property>

<Name> mapred. system. dir </Name>

<Value>/home/hadoop/hadoop_dir/mapred/System </value>

<Description> the directory where mapreduce stores control files.

</Description>

</Property>

<Name> mapreduce. jobtracker. Staging. Root. dir </Name>

<! -- <Value >$ {hadoop. tmp. dir}/mapred/staging </value> -->

<Value>/user/$ {user. name}/mapred/staging </value>

<Description> the root of the staging area for users 'job files

In practice, this shocould be the directory where users 'home

Directories are located (usually/user)

</Description>

</Property>

<Name> mapred. tasktracker. Map. Tasks. Maximum </Name>

<Description> the maximum number of map tasks that will be run

Simultaneously by a task tracker.

</Description>

</Property>

<Name> mapred. tasktracker. Reduce. Tasks. Maximum </Name>

<Description> the maximum number of reduce tasks that will be run

Simultaneously by a task tracker.

</Description>

</Property>

<Name> mapred. Child. java. opts </Name>

<Description> JAVA opts for the task tracker child processes.

The following symbol, if present, will be interpolated: @ taskid @ is replaced

By current taskid. Any other occurrences of '@ 'will go unchanged.

For example, to enable verbose GC logging to a file named for the taskid in

/Tmp and to set the heap maximum to be a gigabyte, pass a 'value':

-Xmx1024m-verbose: GC-xloggc:/tmp/@ taskid @. GC

The configuration variable mapred. Child. ulimit can be used to control

Maximum virtual memory of the Child processes.

</Description>

</Property>

<Name> DFS. Hosts. Exclude </Name>

<Value>/home/hadoop/hadoop_dir/slaves. Exclude </value>

</Property>

<Name> mapred. Hosts. Exclude </Name>

<Value>/home/hadoop/hadoop_dir/slaves. Exclude </value>

</Property>

</Configuration>

Modify the. bash_profile file, that is, modify the link

Export java'home =/usr/lib/JVM/Java-1.6.0

Export hadoop_home =/home/hadoop/hadoop_instils/hadoop-0.21.0

Export Path = $ path: $ java_home/bin: $ hadoop_home/bin

Change

Export java'home =/usr/lib/JVM/Java-1.6.0

Export hadoop_home =/home/hadoop/hadoop_install/hadoop-0.20.205.0

Export Path = $ path: $ java_home/bin: $ hadoop_home/bin

Update source. bash_profile

Copy the configured file to the corresponding user directory and datanode:

CP. bash_profile/home/user001

CP. bash_profile/home/user002

CP. bash_profile/home/user003

SCP. bash_profile hadoop@192.168.1.33:/home/hadoop

SCP. bash_profile hadoop@192.168.1.34:/home/hadoop

SCP-r/home/hadoop/hadoop_install
Hadoop@192.168.1.33:/home/hadoop

SCP-r/home/hadoop/hadoop_install
Hadoop@192.168.1.34:/home/hadoop

Exit the current user and log on again. After Stop-all.sh, execute hadoop namenode-format

Replace the hadoop_d/dfs/name/current/fsimage file in namenode (overwrite from backup attachment), modify version =-32 in hadoop_d/dfs/name/current/fsimage/version, record the namespaceids value as X, and change hadoop_dir/dfs/data/current/version in datanode to-32 in/datahadoop_dir/dfs/data/current/version, change namespaceids to X.

Note:

If you want to test the open path, you can use

Strace-O output.txt-Fe open start-dfs.sh

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More