我們在cdh4版本的hadoop上使用 distcp 把資料從cdh5版本的hadoop拷到cdh4,命令如下
hadoop distcp -update -skipcrccheck hftp://cdh5:50070/xxxx hdfs://cdh4/xxx
當檔案非常大會有這樣的報錯,
2017-12-15 10:47:24,506 INFO execute.BulkLoadHbase - Caused by: java.io.IOException: Got EOF but currentPos = 2278825984 < filelength = 3486427523 2017-12-15 10:47:24,506 INFO execute.BulkLoadHbase - at org.apache.hadoop.hdfs.ByteRangeInputStream.update(ByteRangeInputStream.java:172)
2017-12-15 10:47:24,506 INFO execute.BulkLoadHbase - at org.apache.hadoop.hdfs.ByteRangeInputStream.read(ByteRangeInputStream.java:187)
2017-12-15 10:47:24,506 INFO execute.BulkLoadHbase - at java.io.DataInputStream.read(DataInputStream.java:149)
2017-12-15 10:47:24,506 INFO execute.BulkLoadHbase - at java.io.BufferedInputStream.read1(BufferedInputStream.java:273)
2017-12-15 10:47:24,506 INFO execute.BulkLoadHbase - at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
2017-12-15 10:47:24,506 INFO execute.BulkLoadHbase - at java.io.FilterInputStream.read(FilterInputStream.java:107)
查到資料使用webhdfs的方式可以解決,命令如下
hadoop distcp -update -skipcrccheck webhdfs://cdh5:50070/xxxx hdfs://cdh4/xxx