HBase多執行緒建立HTable問題

來源:互聯網
上載者:User
關鍵字 java manager nbsp; null

最近在寫wormhole的HBase plugin,需要分別實現hbase reader和hbase writer,在測試的時候會報錯如下:

2013-07-08 09:30:02,568 [pool-2-thread-1] org.apache.hadoop.hbase.client.HConnectionManager$ HConnectionImplementation.processBatchCallback(HConnectionManager.java:1631) WARN client. HConnectionManager$HConnectionImplementation - Failed all from region=t1,, 1373246892580.877bb26da1e4aed541915870fa924224., hostname=test89.hadoop, port= 60020java.util.concurrent.ExecutionException: java.io.IOException: Call to test89.hadoop/10.1.77.89:60020 failed on local exception: java.io.InterruptedIOException: Interruped while waiting for IO on channel java.nio.channels.SocketChannel[connected local=/10.1.77.84:51032 remote=test89.hadoop/10.1.77.89:60020]. 59999 millis timeout left. at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) at java.util.concurrent.FutureTask.get( FutureTask.java:83) at org.apache.hadoop.hbase.client.HConnectionManager$ HConnectionImplementation.processBatchCallback(HConnectionManager.java:1601) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1453) at org.apache.hadoop.hbase.client.HTable.flushCommits( HTable.java:936) at org.apache.hadoop.hbase.client.HTable.put(HTable.java:783) at com.dp.nebula.wormhole.plugins.common.HBaseClient.flush(HBaseClient.java:121) at com.dp.nebula.wormhole.plugins.writer.hbasewriter.HBaseWriter.commit(HBaseWriter.java:112) at com.dp.nebula.wormhole.engine.core.WriterThread.call(WriterThread.java:52) at com.dp.nebula.wormhole.engine.core.WriterThread.call(WriterThread.java:1) at java.util.concurrent.FutureTask$ Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run( Thread.java:662)Caused by: java.io.IOException: Call to test89.hadoop/10.1.77.89:60020 failed on local exception: java.io.InterruptedIOException: Interruped while waiting for IO on channel java.nio.channels.SocketChannel[connected local=/10.1.77.84:51032 remote=test89.hadoop/ 10.1.77.89:60020]. 59999 millis timeout left. at org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:1030) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:999) at org.apache.hadoop.hbase.ipc.SecureRpcEngine$Invoker.invoke(SecureRpcEngine.java:104)​ at com.sun.proxy.$ Proxy5.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1. call(HConnectionManager.java:1430) at org.apache.hadoop.hbase.client.HConnectionManager$ HConnectionImplementation$3$1.call(HConnectionManager.java:1428) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:215) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java :1437) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call( HConnectionManager.java:1425) ... 5 moreCaused by: java.io.InterruptedIOException: Interruped while waiting for IO on channel java.nio.channels.SocketChann el[connected local=/10.1.77.84:51032 remote=test89.hadoop/10.1.77.89:60020]. 59999 millis timeout left.2013-07-08 09:30:03,579 [pool-2-thread-6] com.dp.nebula.wormhole.engine.core.WriterThread.call(WriterThread.java:56) ERROR core. WriterThread - Exception occurs in writer thread!com.dp.nebula.wormhole.common.WormholeException: java.io.IOException : org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@b7c96a9 closed at com.dp.nebula.wormhole.plugins.writer.hbasewriter.HBaseWriter.commit(HBaseWriter.java:114) at com.dp.nebula.wormhole.engine.core.WriterThread.call(WriterThread.java:52) at com.dp.nebula.wormhole.engine.core.WriterThread.call(WriterThread.java:1) at java.util.concurrent.FutureTask$ Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run( ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:662)Caused by: java.io.IOException: org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@b7c96a9 closed at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion( HConnectionManager.java:877) at org.apache.hadoop.hbase.client.HConnectionManager$ HConnectionImplementation.locateRegion(HConnectionManager.java:857) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback( HConnectionManager.java:1568) at org.apache.hadoop.hbase.client.HConnectionManager$ HConnectionImplementation.processBatch(HConnectionManager.java:1453) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:936) at org.apache.hadoop.hbase.client.HTable.put (HTable.java:783) at com.dp.nebula.wormhole.plugins.common.HBaseClient.flush(HBaseClient.java:121) at com.dp.nebula.wormhole.plugins.writer.hbasewriter.HBaseWriter.commit(HBaseWriter.java:112) ... 7 more

wormhole的reader和writer會分別起一個ThreadPoolExecutor,出錯是在writer端的flush階段,也就是最後一次批量插入操作。 由於我的reader是每一個thread一個htable instance沒有問題,而writer是共用了一個singleton HBaseClient, 然後用ThreadLocal去保證每一個thread擁有一個本地htable物件,有可能有錯誤,最簡單的方法是把writer端不用singleton HBaseClient,問題應該解決,不過沒搞清root cause, 不爽啊。。。
後來看了HTable和HAdmin的原始程式碼才有點線索

public HTable(Configuration conf, final byte [] tableName) throws IOException { this.tableName = tableName; this.clea nupPoolOnClose = this.cleanupConnectionOnClose = true; if (conf == null) { this.connection = null; return; } this.connection = HConnectionManager.getConnection(conf); this.configuration = conf; int maxThreads = conf.getInt("hbase.htable.threads.max", Integer.MAX_VALUE); if (maxThreads == 0) { maxThreads = 1; // is there a better default? } long keepAliveTime = conf.getLong("hbase.htable.thr eads.keepalivetime", 60); ((ThreadPoolExecutor)this.pool).allowCoreThreadTimeOut(true); this.finishSetup(); }

每一個HTable instance都有一個HConnection物件,它負責與Zookeeper和之後的HBase Cluster建立連結(比如cluster中定位region,locations的cache, 當region移動後重新校準),它由HConnectionManager來管理

public static HConnection getConnection(Configuration conf) throws ZooKeeperConnectionException { HConnectionKey co nnectionKey = new HConnectionKey(conf); synchronized (HBASE_INSTANCES) { HConnectionImplementation connection = HBASE_INSTANCES.get(connectionKey); if ( connection == null) { connection = new HConnectionImplementation(conf, true); HBASE_INSTANCES.put(connectionKey, connection); } connection.incCount(); return connection; } }

HConnectionManager內部有LRU MAP  => HBASE_INSTANCES的靜態變數作為cache,key為HConnectionKey,包含了username和指定的properties( 由傳進去的conf提取), value就是HConnection具體實現HConnectionImplementation,由於傳入進去的conf都一樣,所以都指向同一個HConnectionImplementation, 最後會調用connection.incCount()將client reference count加1

public void close() throws IOException { if (this.closed) { return; } flushCommits(); if (cleanupPoolOnClose) { this.pool. shutdown(); } if (cleanupConnectionOnClose) { if (this.connection != null) { this.connection.close(); } } this.closed = true; }

HTable close的時候,會先flushCommits,將writerBuffer中的List<Put>一次性通過connection的processBatch方法處理掉,然後進入close connection邏輯,依據也是reference count,先對其減1,當為0或者是過期connection,就會執行close connection並從HBASE_INSTANCES中remove掉。

關閉步驟:

1. 關閉HMasterInterface

2. 關閉一串HRegionInterface

3. 關閉zookeeper watcher

HConnectionImplementation connection = HBASE_INSTANCES .get(connectionKey); if (connection != null) { connection.decCount(); if (connection.isZeroReference() || staleConnection) { HBASE_INSTANCES.r emove(connectionKey); connection.close(stopProxy); } else if (stopProxy) { connection.stopProxyOnClose(stopProxy); } } else { LOG.error("Connection not found in the list, can't delete it "+ "(connection key="+connectionKey+"). May be the key was modified?"); }


HBaseAdmin內部也是一樣,也是reference了一個HConnection,可以認為和HTable是共用HConnection

分析到這裡再聯想到之前的報錯資訊"java.io.IOException: org.apache.hadoop.hbase.client.HConnectionManager$ HConnectionImplementation@b7c96a9 closed",應該是client reference count數的問題,果然在我的HBaseClient代碼裡面在初始化環節會new一個singleton HBaseClient,裡面new一個HBaseAdmin(thread共用),在各個thread啟動時候會new 一個threadloca的htable物件

private HTable getHTable() throws IOException { HTable htable = threadLocalHtable.get(); if (htable == null) { htable = ne w HTable(conf, tableName); htable.setAutoFlush(autoFlush); htable.setWriteBufferSize(writeBufferSize); threadLocalHtable.set(htable); } return htable; }

但是我的close方法寫的有問題,把HBaseAdmin close了很多遍(每個thread close了一次)

public synchronized void close() throws IOException { HTable table = threadLocalHtable.get(); if (table != null) { ​ table.close(); ​ ​ ​ ​ ​ table = null; ​ ​ ​ ​ ​ threadLocalHtable.remove(); } if (admin != null) { admin.close(); } }

假設一種場景,有10個writer執行緒,那麼reference count是11 (10個HTable和1個HBaseAdmin),前5個thread close沒問題,第6個thread先close htable, 然後發現引用數為0了,它關閉了HConnection,然後再close admin無效,但是其餘4個執行緒這時候有可能在執行flush操作,HConnection既然已經斷了,那肯定沒辦法flush完啊,拋出異常出來

public void flush() throws IOException { if (getPutBuffer().size() &gt; 0) { getHTable().put(getPutBuffer()); clearPutBuff er(); } }

知道了錯誤原因後,果斷在close HBaseAdmin後,把它指向Null,這樣就不會導致重複關閉的問題了

if (admin != null) { admin.close(); admin = null; }


這個問題困擾我幾天,網上又找不到任何資料,後來還是看了HBase原始程式碼才有新的發現並解決的,看來真碰到問題一定要懂底層代碼才行。

相關文章

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.