Use sed and awk to obtain the latest DataNode information of the cluster, awkdatanode

Last Update:2015-02-27 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Due to the poor remote desktop, when the cluster has missing block, you cannot log on to the Remote Desktop to view the nodes that have crashed the DataNode process due to restart. At the same time, it is inconvenient to simply use the command hdfs dfsadmin-report to view the information, and there is too much information. The following is a simple script implemented with sed and awk:

Cat lastDeadNodes. sh

Hdfs dfsadmin-report> all. log
# Sed-n'/Dead/, $ P' all. log> deadnodes. log
Sed '1,/Dead/d' all. log> deadnodes. log
Sed-I/Rack/d deadnodes. log
Awk 'in in {RS = "\ n"; ORS = "\ n"; FS = "\ n "; OFS = "\ t"} {print $2, $15} 'deadnodes. log> last. log
Dt = 'date'
Dt = 'echo $ dt | awk-F "" '{print $2 "" $3 }''
Grep "$ dt" last. log

The following table describes the data format obtained by hdfs dfsadmin-report:

.......

Name: 10.39.0.185: 50010 (10.39.0.185)
Hostname: 10.39.0.185
Rack:/YH11070028
Decommission Status: Normal
Configured Capacity: 46607198593024 (42.39 TB)
DFS Used: 22027374755910 (20.03 TB)
Non DFS Used: 0 (0 B)
DFS Remaining: 24579823837114 (22.36 TB)
DFS Used %: 47.26%
DFS Remaining %: 52.74%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used %: 100.00%
Cache Remaining %: 0.00%
Last contact: Wed Feb 25 23:14:43 CST 2015

Dead datanodes:
Name: 10.39.1.35: 50010 (10.39.1.35)
Hostname: 10.39.1.35
Rack:/YH11070032
Decommission Status: Normal
Configured Capacity: 0 (0 B)
DFS Used: 0 (0 B)
Non DFS Used: 0 (0 B)
DFS Remaining: 0 (0 B)
DFS Used %: 100.00%
DFS Remaining %: 0.00%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used %: 100.00%
Cache Remaining %: 0.00%
Last contact: Mon Jan 26 10:08:36 CST 2015

.......

From the above information, we can see that the information of the DN node hanging in hadoop2.0 is after the "Dead datanodes:" line, so the key to writing the script is to get the information of all the subsequent rows from this line.

The following describes the script:

1. Run the hdfs dfsadmin-report command to get the cluster's DN information and output it to the all. log file, including the surviving and failed DN information.

2. Get the disconnected DN node information and use sed to implement

# Sed-n'/Dead/, $ P' all. log> deadnodes. log # The information obtained in this method will contain the matched line "Dead datanodes :"
Sed '1,/Dead/d' all. log> deadnodes. log # The information obtained in this method is the information from the next row to the end of the matched row "Dead datanodes:", removing the matching row

3. After completing step 2, we get all the disconnected DN information. However, after testing, we found that some DN information does not contain "Rack, this line is not what we need at the end. Simply delete it:

Sed-I/Rack/d deadnodes. log

4. The information format after Step 3 is completed is as follows:

Name: 10.39.1.35: 50010 (10.39.1.35)
Hostname: 10.39.1.35
Decommission Status: Normal
Configured Capacity: 0 (0 B)
DFS Used: 0 (0 B)
Non DFS Used: 0 (0 B)
DFS Remaining: 0 (0 B)
DFS Used %: 100.00%
DFS Remaining %: 0.00%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used %: 100.00%
Cache Remaining %: 0.00%
Last contact: Mon Jan 26 10:08:36 CST 2015

Name: 10.39.6.197: 50010 (10.39.6.197)
Hostname: 10.39.6.197
Decommission Status: Normal
Configured Capacity: 0 (0 B)
DFS Used: 0 (0 B)
Non DFS Used: 0 (0 B)
DFS Remaining: 0 (0 B)
DFS Used %: 100.00%
DFS Remaining %: 0.00%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used %: 100.00%
Cache Remaining %: 0.00%
Last contact: Mon Jan 19 18:23:56 CST 2015

In fact, the information required for our values is Hostname and Last contact. This step is implemented using awk and multiple rows are processed as one record:

Awk 'in in {RS = "\ n"; ORS = "\ n"; FS = "\ n "; OFS = "\ t"} {print $2, $15} 'deadnodes. log> last. log

Several Keywords of awk are used: RS/ORS/FS/OFS

RS: Record Separator, Record Separator
ORS: Output Record Separate, Output current Record Separator
FS: Field Separator, Field Separator
OFS: Out of Field Separator, output Field Separator

The awk script means that the three linefeeds "\ n" are the delimiters for processing records (RS = "\ n"), both from "Name: "to" Last contact: ", awk is processed as a record. The processed records are separated by a linefeed (ORS =" \ n "), the delimiter between fields of a record to be processed is a line break (FS = "\ n "), however, \ t (OFS = "\ t") is used as the delimiter between the fields in the record after processing. In each record, we only focus on Hostname and Last contact, therefore, you only need to print $2 and $15 ({print $2, $15 }).

See the following data format after processing:

Hostname: 10.39.1.35Last contact: Mon Jan 26 10:08:36 CST 2015
Hostname: 10.39.6.197 Last contact: Mon Jan 19 18:23:56 CST 2015
Hostname: 10.39.5.80 Last contact: Sat Feb 07 03:59:20 CST 2015
Hostname: 10.39.4.247 Last contact: Wed Feb 25 17:27:51 CST 2015
Hostname: 10.39.6.199 Last contact: Mon Feb 02 10:42:21 CST 2015
Hostname: 10.39.7.55 Last contact: Thu Feb 26 00:26:17 CST 2015
Hostname: 10.39.0.218 Last contact: Thu Feb 12 07:18:54 CST 2015
Hostname: 10.39.0.208 Last contact: Mon Feb 09 12:22:13 CST 2015
Hostname: 10.39.4.235 Last contact: Thu Jan 01 08:00:00 CST 1970
Hostname: 10.39.4.243 Last contact: Thu Jan 01 08:00:00 CST 1970

5. At last, we only pay attention to the DN nodes that are suspended today.

Dt = 'date'
Dt = 'echo $ dt | awk-F "" '{print $2 "" $3 }''
Grep "$ dt" last. log

Result:

Hostname: 10.39.7.55Last contact: Thu Feb 26 00:26:17 CST 2015.

Summary:

This script uses some special usage of awk and sed, which can be easily queried after being recorded.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Use sed and awk to obtain the latest DataNode information of the cluster, awkdatanode

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support