Hadoop + Hbase cluster data migration

Last Update:2016-04-02 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Data migration or backup is a possible issue for any company. The official website also provides several solutions for hbase data migration. We recommend using Hadoop distcp for migration. It is suitable for data migration between large data volumes or cross-version clusters.

Version

Hadoop2.7.1

Hbase0.98.12

A problem found during the use of Hadoop distcp during the migration of hbase data of the same version today:

This error occurs because the source file size is inconsistent with the target file size. The cause of this error is unclear. Then, you can search for similar errors on the Internet and find no similar examples, some are

The preceding error indicates that the crc file checksum does not match. The file size is inconsistent. After three retries, the error is similar. So I tried to find the answer in the hadoop official documentation, found in the official website documentation distcp

There is an update parameter, which is explained on the official website as follows:

What does it mean?

This means that if the size, block size, or checksum of the source and target files are inconsistent during the re-copy process, the source file will be forcibly replaced with the target file.

Do not use it. Be cautious when using it because it may change the target path.

For example:

Assume that the data of cluster A is to be migrated to cluster B, and the Hbase structure directory is consistent:

The data migration directory of cluster A is as follows:

/Data/01/
/Data/01/B
/Data/01/c
/Data/01/d
/Data/01/e

Ideally, the Directory of Cluster B migrated to is the same as that of cluster:

/Data/01/
/Data/01/B
/Data/01/c
/Data/01/d
/Data/01/e

However, after-update is used, it is likely to become the following directory structure:

/Data/01
/Data/
/Data/B
/Data/c
/Data/d
/Data/e

In this case, the update document has already described it, because when using this command, it will force keep any information about the source file, including the path, in this way, only 100% of the copied data cannot be changed. Although the directory is misplaced, the data is correct. You can solve this problem with a tip, if you already know that a job will encounter this situation, you should complete the path of its directory in advance, so that you do not need to manually move the file to the correct directory. For example, my original migration command is as follows:

Hadoop distcp hdfs: // 10.0.0.100: 8020/hbase/data/default/ETLDB hdfs: // 10.0.0.101: 8020/hbase/data/default

The data can be migrated correctly, but if update is used, the following path should be used. Note that the table name is added to the target path. If the table name does not exist

Hadoop distcp-update hdfs: // 10.0.0.100: 8020/hbase/data/default/ETLDB hdfs: // 10.0.0.101: 8020/hbase/data/default/ETLDB

Imagine that if your hbase table has more than 10000 region, it means that you need to process these 10000 misplaced directories into the correct directory, although writing a script can also be automated, it takes a long time, and who can ensure that the script will not be faulty, so it is not recommended to fix it later.

After the migration is complete, start the hbase cluster service and execute the following two commands to restore metadata. Otherwise, the hbase cluster will not recognize the newly migrated table:

./Hbase hbck-fix
./Hbase hbck-repairHoles

Summary:

(1) If there is a problem and you don't need to worry about it, you can search for a similar exception on google first. If not, you need to read the distcp documentation parameter on the official website, note that the document version and your hadoop version must be consistent. Otherwise, some parameters may be obsolete or not supported.

(2) If an IO exception occurs in xxx file not exist when distcp is a large directory, you can try to reduce the number of copied file directories. If it still fails, you need to go back to method 1 to find the problem. In most cases, it is not easy for us to copy a small number of directories.

Reference:

Http://hadoop.apache.org/docs/r2.7.1/hadoop-distcp/DistCp.html

You may also like the following articles about Hadoop:

Tutorial on standalone/pseudo-distributed installation and configuration of Hadoop2.4.1 under Ubuntu14.04

Install and configure Hadoop2.2.0 on CentOS

Build a Hadoop environment on Ubuntu 13.04

Cluster configuration for Ubuntu 12.10 + Hadoop 1.2.1

Build a Hadoop environment on Ubuntu (standalone mode + pseudo Distribution Mode)

Configuration of Hadoop environment in Ubuntu

Detailed tutorial on creating a Hadoop environment for standalone Edition

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Hadoop + Hbase cluster data migration

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Hadoop + Hbase cluster data migration

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support