Hadoop叢集間的hbase資料移轉

最後更新：2014-06-15 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

在日常的使用過程中，可能經常需要將一個叢集中hbase的資料移轉到或者拷貝到另外一個叢集中，這時候，可能會出很多問題

以下是我在處理的過程中的一些做法和處理方式。

前提，兩個hbase的版本一直，否則可能出現不可預知的問題，造成資料移轉失敗

當兩個叢集不能通訊的時候，可以先將資料所在叢集中hbase的資料檔案拷貝到本地

具體做法如下：

在Hadoop目錄下執行如下命令，拷貝到本地檔案。

bin/hadoop fs -copyToLocal /hbase/tab_keywordflow /home/test/xiaochenbak

然後你懂得，將檔案拷貝到你需要的你需要遷移到的那個叢集中，目錄是你的表的目錄，

如果這個叢集中也有對應的表檔案，那麼刪除掉，然後拷貝。

/bin/hadoop fs -rmr /hbase/tab_keywordflow

/bin/hadoop fs -copyFromLocal /home/other/xiaochenbak /hbase/tab_keywordflow

此時的/home/other/xiaochenbak為你要遷移到資料的叢集。

重設該表在.META.表中的分區資訊

bin/hbase org.jruby.Main /home/other/hbase/bin/add_table.rb /hbase/tab_keywordflow

/home/other/hbase/bin/add_table.rb為ruby指令碼，可以執行，指令碼內容如下：另存新檔add_table.rb即可

#
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# Script adds a table back to a running hbase.
# Currently only works on if table data is in place.
#
# To see usage for this script, run:
#
# ${HBASE_HOME}/bin/hbase org.jruby.Main addtable.rb
#
include Java
import org.apache.hadoop.hbase.util.Bytes
import org.apache.hadoop.hbase.HConstants
import org.apache.hadoop.hbase.regionserver.HRegion
import org.apache.hadoop.hbase.HRegionInfo
import org.apache.hadoop.hbase.client.HTable
import org.apache.hadoop.hbase.client.Delete
import org.apache.hadoop.hbase.client.Put
import org.apache.hadoop.hbase.client.Scan
import org.apache.hadoop.hbase.HTableDescriptor
import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.hadoop.hbase.util.FSUtils
import org.apache.hadoop.hbase.util.Writables
import org.apache.hadoop.fs.Path
import org.apache.hadoop.fs.FileSystem
import org.apache.commons.logging.LogFactory
# Name of this script
NAME = "add_table"
# Print usage for this script
def usage
puts 'Usage: %s.rb TABLE_DIR [alternate_tablename]' % NAME
exit!
end
# Get configuration to use.
c = HBaseConfiguration.new()
# Set hadoop filesystem configuration using the hbase.rootdir.
# Otherwise, we'll always use localhost though the hbase.rootdir
# might be pointing at hdfs location.
c.set("fs.default.name", c.get(HConstants::HBASE_DIR))
fs = FileSystem.get(c)
# Get a logger and a metautils instance.
LOG = LogFactory.getLog(NAME)
# Check arguments
if ARGV.size < 1 || ARGV.size > 2
usage
end
# Get cmdline args.
srcdir = fs.makeQualified(Path.new(java.lang.String.new(ARGV[0])))
if not fs.exists(srcdir)
raise IOError.new("src dir " + srcdir.toString() + " doesn't exist!")
end
# Get table name
tableName = nil
if ARGV.size > 1
tableName = ARGV[1]
raise IOError.new("Not supported yet")
elsif
# If none provided use dirname
tableName = srcdir.getName()
end
HTableDescriptor.isLegalTableName(tableName.to_java_bytes)
# Figure locations under hbase.rootdir
# Move directories into place; be careful not to overwrite.
rootdir = FSUtils.getRootDir(c)
tableDir = fs.makeQualified(Path.new(rootdir, tableName))
# If a directory currently in place, move it aside.
if srcdir.equals(tableDir)
LOG.info("Source directory is in place under hbase.rootdir: " + srcdir.toString());
elsif fs.exists(tableDir)
movedTableName = tableName + "." + java.lang.System.currentTimeMillis().to_s
movedTableDir = Path.new(rootdir, java.lang.String.new(movedTableName))
LOG.warn("Moving " + tableDir.toString() + " aside as " + movedTableDir.toString());
raise IOError.new("Failed move of " + tableDir.toString()) unless fs.rename(tableDir, movedTableDir)
LOG.info("Moving " + srcdir.toString() + " to " + tableDir.toString());
raise IOError.new("Failed move of " + srcdir.toString()) unless fs.rename(srcdir, tableDir)
end
# Clean mentions of table from .META.
# Scan the .META. and remove all lines that begin with tablename
LOG.info("Deleting mention of " + tableName + " from .META.")
metaTable = HTable.new(c, HConstants::META_TABLE_NAME)
tableNameMetaPrefix = tableName + HConstants::META_ROW_DELIMITER.chr
scan = Scan.new((tableNameMetaPrefix + HConstants::META_ROW_DELIMITER.chr).to_java_bytes)
scanner = metaTable.getScanner(scan)
# Use java.lang.String doing compares. Ruby String is a bit odd.
tableNameStr = java.lang.String.new(tableName)
while (result = scanner.next())
rowid = Bytes.toString(result.getRow())
rowidStr = java.lang.String.new(rowid)
if not rowidStr.startsWith(tableNameMetaPrefix)
# Gone too far, break
break
end
LOG.info("Deleting row from catalog: " + rowid);
d = Delete.new(result.getRow())
metaTable.delete(d)
end
scanner.close()
# Now, walk the table and per region, add an entry
LOG.info("Walking " + srcdir.toString() + " adding regions to catalog table")
statuses = fs.listStatus(srcdir)
for status in statuses
next unless status.isDir()
next if status.getPath().getName() == "compaction.dir"
regioninfofile = Path.new(status.getPath(), HRegion::REGIONINFO_FILE)
unless fs.exists(regioninfofile)
LOG.warn("Missing .regioninfo: " + regioninfofile.toString())
next
end
is = fs.open(regioninfofile)
hri = HRegionInfo.new()
hri.readFields(is)
is.close()
# TODO: Need to redo table descriptor with passed table name and then recalculate the region encoded names.
p = Put.new(hri.getRegionName())
p.add(HConstants::CATALOG_FAMILY, HConstants::REGIONINFO_QUALIFIER, Writables.getBytes(hri))
metaTable.put(p)
LOG.info("Added to catalog: " + hri.toString())
end

好了，以上就是我的做法，如何叢集鍵可以通訊，那就更好辦了，相信你懂得，scp

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More