Test the impact of NFS on hadoop (HDFS) clusters)

Source: Internet
Author: User
Tags exit in nfsd hadoop fs
Test environment and system information

$ Uname-
Linux 10. **. **. 15 2.6.32-220.17.1.tb619.el6.x86 _ 64 #1 SMP Fri Jun 8 13: 48: 13cst 2012 x86_64 x86_64 x86_64 GNU/Linux

HadoopAnd hbase version information:

Hadoop-0.20.2-cdh3u4

Hbase-0.90-adh1u7.1

 

10. **. **. 12 NFS serverTo provide the NFS service.

10. **. **. 15Attach 10. **. **. 12 NFS shared directory as HDFS namenode

 

Ganglia-5.rpm as a file operation object, the size of around 3 M.

 

Hadoop/CONF/hdfs-site.xmlThe NFS configuration information is as follows:

<Property>
<Name> DFS. Name. dir </Name>
<Value>/u01/hbase/nndata/local,/u01/hbase/nndata/nfs </value>

</Property>

 


NFS server service downtime

NFS serverEnd Service stopped, run:

$ Sudo service NFS status
Rpc. svcgssd is stopped
Rpc. mountd is stopped
NFSD is stopped
Rpc. rquotad is stopped

 

At this time, HDFS continues to put, but keeps hang and does not exit.

 

NFSAfter the service is restarted, HDFS continues to put and hang. Re-execute the put operation. After the Hang expires, the timeout service continues, prompting that the file exists. Execute:

$ Sh hadoop/bin/hadoop FS-ls HDFS: // 10. **. **. 15: 9516/An empty file with the same name is found in the directory.

 

$ Tail-F hadoop-**-namenode-10. **. **. 15.logNo logs are output at a time. logs are output only after the put operation continues. All the logs for this operation are output at a time, including the error information of the put failed file.

 

2012-10-23 11:22:38, 956 warn Org. apache. hadoop. IPC. server: IPC server responder, call create (/ganglia-4.rpm, rwxr-XR-X, dfsclient _-621134164, false, 3, 67108864) from 10. **. **. 15: 47771: Output Error
11:22:38, 957 info org. Apache. hadoop. IPC. SERVER: IPC server handler 7 on 9516 caught: Java. NiO. channels. Closedchannelexception
At sun. NiO. Ch. socketchannelimpl. ensurewriteopen (socketchannelimpl. Java: 133)
At sun. NiO. Ch. socketchannelimpl. Write (socketchannelimpl. Java: 324)
At org. Apache. hadoop. IPC. server. channelwrite (server. Java: 1763)
At org. Apache. hadoop. IPC. server. accesskey $2000 (server. Java: 95)
At org. Apache. hadoop. IPC. Server $ responder. processresponse (server. Java: 773)
At org. Apache. hadoop. IPC. Server $ responder. dorespond (server. Java: 837)
At org. Apache. hadoop. IPC. Server $ handler. Execute (server. Java: 1462)
......


2012-10-23 11:22:38, 963 error Org. apache. hadoop. security. usergroupinformation: priviledgedactionexception as: ** (Auth: simple) cause: Org. apache. hadoop. HDFS. protocol. alreadybeingcreatedexception: failed to create file/ganglia-5.rpm for dfsclient_382171631 on client 10. **. **. 15, because this file is already being created by dfsclient _- 1964937422 on 10. **. **. 15
......


14:40:11, 672 warn org. Apache. hadoop. IPC. SERVER: IPC server responder, call getdatanodereport (live) from 10. **. **. 15: 54929: Output Error
14:40:11, 672 info org. Apache. hadoop. HDFS. server. namenode. fsnamesystem: Roll edit log from 10. **. **. 12
14:40:11, 672 info org. Apache. hadoop. IPC. SERVER: IPC server handler 0 on 9516 caught: Java. NiO. channels. closedchannelexception
At sun. NiO. Ch. socketchannelimpl. ensurewriteopen (socketchannelimpl. Java: 133)
At sun. NiO. Ch. socketchannelimpl. Write (socketchannelimpl. Java: 324)
At org. Apache. hadoop. IPC. server. channelwrite (server. Java: 1763)
At org. Apache. hadoop. IPC. server. accesskey $2000 (server. Java: 95)
At org. Apache. hadoop. IPC. Server $ responder. processresponse (server. Java: 773)
At org. Apache. hadoop. IPC. Server $ responder. dorespond (server. Java: 837)
At org. Apache. hadoop. IPC. Server $ handler. Execute (server. Java: 1462)
......

2012-10-23 14:40:11, 672 info Org. apache. hadoop. HDFS. server. namenode. fsnamesystem: number of transactions: 8 total time for transactions (MS): 1 Number of transactions batched in syncs: 0 Number of syncs: 4 synctimes (MS): 4 1007521
14:40:12, 152 info org. Apache. hadoop. HDFS. server. namenode. getimageservlet: downloaded new fsimage with checksum: 444a843721bd52a951673a1ba7aecb37
14:40:12, 154 info org. Apache. hadoop. HDFS. server. namenode. fsnamesystem: Roll fsimage from 10. **. **. 12
2012-10-23 14:40:12, 154 info Org. apache. hadoop. HDFS. server. namenode. fsnamesystem: number of transactions: 0 total time for transactions (MS): 0 Number of transactions batched in syncs: 0 Number of syncs: 1 synctimes (MS): 4 16

 

At this time, the modification time of the hbase_home/nndata/share/current/edits file on the NFS server is re-updated after the NFS service is restored.

After NFS is restored and the put file is re-complete

$ Sh hadoop/bin/hadoop FS-put ~ /Dba-ganglia-gmetad-3.1.7-2.x86_64.rpm HDFS: // 10. **. **. 15: 9516/ganglia-5.rpm:

 

The log information is as follows:

2012-10-23 11:31:08, 794 info Org. apache. hadoop. HDFS. server. namenode. fsnamesystem: number of transactions: 25 total time for transactions (MS): 3 Number of transactions batched in syncs: 2 Number of syncs: 15 synctimes (MS): 10 676853
11:31:08, 804 info org. Apache. hadoop. HDFS. statechange: block * namesystem. allocateblock:/ganglia-5.rpm. blk_2675602071792190621_3890
11:31:08, 855 info org. Apache. hadoop. HDFS. statechange: block * namesystem. addstoredblock: blockmap updated: 10. **. **. 38020 is added to blk_2675602071792190621_3890 size
......
11:31:08, 860 info org. Apache. hadoop. HDFS. statechange: removing lease on file/ganglia-5.rpm from client dfsclient _-19034129
11:31:08, 861 info org. Apache. hadoop. HDFS. statechange: dir * namesystem. completefile: File/ganglia-5.rpm is closed by dfsclient _-19034129

 

When you use $ sudo service NFS stop to disable the NFS service, namenode outputs the following information. This information is not generated because of the NFS service stop notification, but is generated by regular synchronization:

2012-10-23 11:33:54, 815 info Org. apache. hadoop. HDFS. server. namenode. fsnamesystem: number of transactions: 2 total time for transactions (MS): 0 Number of transactions batched in syncs: 0 Number of syncs: 0 synctimes (MS): 0 0

 

Query the safemode status of HDFS:

$ Sh hadoop/bin/hadoop dfsadmin-safemode get
Safe mode is off

HDFS does not automatically switch to safemode.

 


NFS server service failure

NFS serverRun the following command:

$ Sudo killall-9 nfsd

View NFS status:

$ Sudo service NFS status
Rpc. svcgssd is stopped
Rpc. mountd (PID 10677) is running...
NFSD is stopped
Rpc. rquotad (PID 10645) is running...

 

Run

$ Sh hadoop/bin/hadoop dfsadmin-Report

And

$ Sh hadoop/bin/hadoop FS-put ~ /Dba-ganglia.rpm HDFS: // 10. **. **. 15: 9516/ganglia-13.rpm

The put Operation of the test file, both of which will be hang. It is the same as in Test Case 1. The report and put sessions will remain in the Hang State and will not exit in the timeout state.

At this time, the NFS service will be automatically restored after a time-out period.

 


Test conclusion

1. NFSIf the files involved in HDFS on the client are not affected on datanode (eg: $ shhadoop/bin/hadoop FS-cat HDFS: // 10. **. **. 15: 9516/11. TXT can read text );

2. NFSAfter the HDFS file write operation is involved in the client, it will be hang and will not time out and quit.

3. NFSAfter mounting (including servicenfs stop or killall nfsd Service), HDFS write operations will be hang all the time. After the NFS service is restored, HDFS write operations will continue, the operation will be completed normally. The detailed logs of the operation during this period will be batch output to hadoop_namenode.log after the NFS service returns to normal. The timeout configuration will be discussed in subsequent tests.

 

From http://hi.baidu.com/richarwu/item/0c900469d48e9f2069105b9f

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.