使用happybase訪問HBase出現Broken pipe問題---兩個“驚天”大bug

來源:互聯網
上載者:User

標籤:hbase   happybase   

來源
使用happybase通過thrift介面向HBase讀取、寫入資料時,出現Broken pipe的錯誤。排查步驟:

1、查看hbase的日誌:
Java HotSpot(TM) 64-Bit Server VM warning: Using incremental CMS is deprecated and will likely be removed in a future release17/05/12 18:08:41 INFO util.VersionInfo: HBase 1.2.0-cdh5.10.117/05/12 18:08:41 INFO util.VersionInfo: Source code repository file:///data/jenkins/workspace/generic-package-centos64-7-0/topdir/BUILD/hbase-1.2.0-cdh5.10.1 revision=Unknown17/05/12 18:08:41 INFO util.VersionInfo: Compiled by jenkins on Mon Mar 20 02:46:09 PDT 201717/05/12 18:08:41 INFO util.VersionInfo: From source with checksum c6d9864e1358df7e7f39d39a40338b4e17/05/12 18:08:41 INFO thrift.ThriftServerRunner: Using default thrift server type17/05/12 18:08:41 INFO thrift.ThriftServerRunner: Using thrift server type threadpool17/05/12 18:08:42 WARN impl.MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-hbase.properties,hadoop-metrics2.properties17/05/12 18:08:42 INFO impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).17/05/12 18:08:42 INFO impl.MetricsSystemImpl: HBase metrics system started17/05/12 18:08:42 INFO mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog17/05/12 18:08:42 INFO http.HttpRequestLog: Http request log for http.requests.thrift is not defined17/05/12 18:08:42 INFO http.HttpServer: Added global filter ‘safety‘ (class=org.apache.hadoop.hbase.http.HttpServer$QuotingInputFilter)17/05/12 18:08:42 INFO http.HttpServer: Added global filter ‘clickjackingprevention‘ (class=org.apache.hadoop.hbase.http.ClickjackingPreventionFilter)17/05/12 18:08:42 INFO http.HttpServer: Added filter static_user_filter (class=org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter) to context thrift17/05/12 18:08:42 INFO http.HttpServer: Added filter static_user_filter (class=org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter) to context static17/05/12 18:08:42 INFO http.HttpServer: Added filter static_user_filter (class=org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter) to context logs17/05/12 18:08:42 INFO http.HttpServer: Jetty bound to port 909517/05/12 18:08:42 INFO mortbay.log: jetty-6.1.26.cloudera.417/05/12 18:08:42 WARN mortbay.log: Can‘t reuse /tmp/Jetty_0_0_0_0_9095_thrift____.vqpz9l, using /tmp/Jetty_0_0_0_0_9095_thrift____.vqpz9l_512017503248018505817/05/12 18:08:43 INFO mortbay.log: Started [email protected]:909517/05/12 18:08:43 INFO thrift.ThriftServerRunner: starting TBoundedThreadPoolServer on /0.0.0.0:9090 with readTimeout 300000ms; min worker threads=128, max worker threads=1000, max queued requests=1000.../05/08 15:05:51 INFO zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x645132bf connecting to ZooKeeper ensemble=cdh-master-slave1:2181,cdh-slave2:2181,cdh-slave3:218117/05/08 15:05:51 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=cdh-master-slave1:2181,cdh-slave2:2181,cdh-slave3:2181 sessionTimeout=60000 watcher=hconnection-0x64513-master-slave1:2181,cdh-slave2:2181,cdh-slave3:2181, baseZNode=/hbase17/05/08 15:05:51 INFO zookeeper.ClientCnxn: Opening socket connection to server cdh-slave3/192.168.10.219:2181. Will not attempt to authenticate using SASL (unknown error)17/05/08 15:05:51 INFO zookeeper.ClientCnxn: Socket connection established, initiating session, client: /192.168.10.23:43170, server: cdh-slave3/192.168.10.219:218117/05/08 15:05:51 INFO zookeeper.ClientCnxn: Session establishment complete on server cdh-slave3/192.168.10.219:2181, sessionid = 0x35bd74a77802148, negotiated timeout = 60000[[email protected] example]$ 17/05/08 15:32:50 INFO client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x35bd74a7780214817/05/08 15:32:51 INFO zookeeper.ZooKeeper: Session: 0x35bd74a77802148 closed17/05/08 15:32:51 INFO zookeeper.ClientCnxn: EventThread shut down17/05/08 15:38:53 INFO zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0xb876351 connecting to ZooKeeper ensemble=cdh-master-slave1:2181,cdh-slave2:2181,cdh-slave3:218117/05/08 15:38:53 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=cdh-master-slave1:2181,cdh-slave2:2181,cdh-slave3:2181 sessionTimeout=60000 watcher=hconnection-0xb8763510x0, quorum=cdh-master-slave1:2181,cdh-slave2:2181,cdh-slave3:2181, baseZNode=/hbase17/05/08 15:38:53 INFO zookeeper.ClientCnxn: Opening socket connection to server cdh-master-slave1/192.168.10.23:2181. Will not attempt to authenticate using SASL (unknown error)17/05/08 15:38:53 INFO zookeeper.ClientCnxn: Socket connection established, initiating session, client: /192.168.10.23:35526, server: cdh-master-slave1/192.168.10.23:218117/05/08 15:38:53 INFO zookeeper.ClientCnxn: Session establishment complete on server cdh-master-slave1/192.168.10.23:2181, sessionid = 0x15ba3ddc6cc90d4, negotiated timeout = 60000

初步推斷是hbase設定了某個逾時時間,導致串連斷開

2、查看官方文檔,但是沒有發現很有意義的timeout參數3、Google相似問題

查看相似的內容:

Uploaded image for project: ‘HBase‘ HBaseHBASE-14926Hung ThriftServer; no timeout on read from client; if client crashes, worker thread gets stuck readingAgile Board ExportDetailsType: BugStatus:RESOLVEDPriority: MajorResolution: FixedAffects Version/s:2.0.0, 1.2.0, 1.1.2, 1.3.0, 1.0.3, 0.98.16Fix Version/s:2.0.0, 1.2.0, 1.3.0, 0.98.17Component/s:ThriftLabels:NoneHadoop Flags:ReviewedRelease Note: Adds a timeout to server read from clients. Adds new configs hbase.thrift.server.socket.read.timeout for setting read timeout on server socket in milliseconds. Default is 60000;DescriptionThrift server is hung. All worker threads are doing this:"thrift-worker-0" daemon prio=10 tid=0x00007f0bb95c2800 nid=0xf6a7 runnable [0x00007f0b956e0000]   java.lang.Thread.State: RUNNABLE        at java.net.SocketInputStream.socketRead0(Native Method)        at java.net.SocketInputStream.read(SocketInputStream.java:152)        at java.net.SocketInputStream.read(SocketInputStream.java:122)        at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)        at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)        at java.io.BufferedInputStream.read(BufferedInputStream.java:334)        - locked <0x000000066d859490> (a java.io.BufferedInputStream)        at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)        at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)        at org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)        at org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)        at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)        at org.apache.thrift.protocol.TCompactProtocol.readByte(TCompactProtocol.java:601)        at org.apache.thrift.protocol.TCompactProtocol.readMessageBegin(TCompactProtocol.java:470)        at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:27)        at org.apache.hadoop.hbase.thrift.TBoundedThreadPoolServer$ClientConnnection.run(TBoundedThreadPoolServer.java:289)        at org.apache.hadoop.hbase.thrift.CallQueue$Call.run(CallQueue.java:64)        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)        at java.lang.Thread.run(Thread.java:745)They never recover.I don‘t have client side logs.We‘ve been here before: HBASE-4967 "connected client thrift sockets should have a server side read timeout" but this patch only got applied to fb branch (and thrift has changed since then)ps:來源https://issues.apache.org/jira/browse/HBASE-14926
4、Google “hbase.thrift.server.socket.read.timeout”

可以看到一個網頁內容:

問題背景測試環境是三台伺服器搭建的Hadoop分布式環境。Hadoop版本是:hadoop-2.7.3;Hbase-1.2.4; zookeeper-3.4.9。 使用thrift c++介面向hbase中寫入資料,每次都是剛開始寫入正常,過一段時間就開始報錯。 但之前使用的hbase-0.94.27版本就沒遇到過該問題,配置也相同,一直用的好好地。thrift介面報錯解決辦法通過抓包可以看出,hbase server響應了RST包,導致串連中斷。 通過 bin/hbase thrift start -threadpool命令可以readTimeout的設定為60s。thriftpool經過驗證卻是和這個設定有關,配置中沒有配置過該項,通過查看代碼發現60s是預設值,如果沒有配置即按照以該值為準。因此在conf/hbase-site.xml中添加上配置即可:<property>         <name>hbase.thrift.server.socket.read.timeout</name>         <value>6000000</value>         <description>eg:milisecond</description></property>ps:來源http://blog.csdn.net/wwlhz/article/details/56012053

所以添加參數後,重啟hbase thrift,發現問題解決

5、查看源碼,可以看到
#https://github.com/apache/hbase/blob/master/hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift/ThriftServerRunner.java...  public static final String THRIFT_SERVER_SOCKET_READ_TIMEOUT_KEY =    "hbase.thrift.server.socket.read.timeout";  public static final int THRIFT_SERVER_SOCKET_READ_TIMEOUT_DEFAULT = 60000;...      int readTimeout = conf.getInt(THRIFT_SERVER_SOCKET_READ_TIMEOUT_KEY,          THRIFT_SERVER_SOCKET_READ_TIMEOUT_DEFAULT);      TServerTransport serverTransport = new TServerSocket(          new TServerSocket.ServerSocketTransportArgs().              bindAddr(new InetSocketAddress(listenAddress, listenPort)).              backlog(backlog).              clientTimeout(readTimeout));

問題解決~~~

6、然而問題解決了嗎?

實際上還是有問題,一段時間發現連續scan大概20多分鐘後,串連又被斷開了,又是一次艱難的搜尋,發現是hbase該版本內建的問題,它將所有串連(不管有沒有在使用)都預設為idle的狀態,然後有個hbase.thrift.connection.max-idletime的配置,所以我將此項配置為31104000(一年),如果是在CDH中,應該在管理頁面配置,

遇到問題一般步驟:
技術進步型:
1、查看日誌,查看報錯的地方,初步定位問題
2、查看官方文檔
3、Google相似的問題,或者查看源碼去定位問題

快速解決問題型:
1、查看日誌,查看報錯的地方,初步定位問題
2、Google相似問題
3、查看官方文檔,或者查看源碼

參考:

  • [1]HBase thrift/thrift2 使用指南

使用happybase訪問HBase出現Broken pipe問題---兩個“驚天”大bug

相關文章

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.