Monitor hard disk status with smartd in CentOS

Source: Internet
Author: User
Compared with processors and memory, the hard disk is the slowest subsystem on the server, the most prone to performance bottlenecks, and the most vulnerable part. Because the hard disk is the farthest away from the processor and accessing the hard disk involves some mechanical operations, such as rotating shafts and rail tracing, the machine is prone to faults. As a VPS service provider and system administrator, the most feared problem is that the hard disk is faulty. Therefore, it is very important to monitor the health status of the hard disk and give early warnings. The hard disk has crashed on our PC servers for almost 1.5 years. there was no warning before it crashed. SUN service

Compared with processors and memory, the hard disk is the slowest subsystem on the server, the most prone to performance bottlenecks, and the most vulnerable part. Because the hard disk is the farthest away from the processor and accessing the hard disk involves some mechanical operations, such as rotating shafts and rail tracing, the machine is prone to faults. As a VPS service provider and system administrator, the most feared problem is that the hard disk is faulty. Therefore, it is very important to monitor the health status of the hard disk and give early warnings. Hard disks may have crashed on our PC servers for almost 1.5 years, but there were no signs before they went down. The SUN server had a good situation and many SATA/SCSI hard disks had been running for five years, it seems that brand servers are expensive for some reason. VPSee saw a paper published by Google some time ago: Failure Trends in a Large Disk Drive Population also confirmed our experience. The conclusion is that only 60% of all bad hard disks can be S. m.A. r. t. detected, that is, S. m.A. r. t. only 60% of the test results are correct, so we cannot fully rely on S. m.A. r. t. monitoring results.

At present, all hard disks on the market have S. m.A. r. t. (Self-Monitoring, Analysis and Reporting Technology), smartmontools is a software package that uses this feature to monitor hard disks, including smartctl and smartd, the former is the front-end command line tool, and the latter is the background running program. smartmontools is not a Linux patent and supports systems such as BSD and Solaris.

Install smartmontools

Install CentOS/Fedora:

# Yum install kernel-utils

Install it in Debian/Ubuntu:

# Apt-get install smartmontools

Use smartmontools

Before testing with smartmontools, check whether the hard disk has the SMART features:

# Smartctl-I/dev/sda
=== Start of information section ====
Device Model: SEAGATE ST32500NSSUN250G 0741B58YP8
Serial Number: 5QE58YP8
Firmware Version: 3.AZK
User Capacity: 250,056,000,000 bytes
Device is: Not in smartctl database [for details use:-P showall]
ATA Version is: 7
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is: Thu Jul 22 22:39:07 2010 SAST
SMART support is: Available-device has SMART capability.
SMART support is: Enabled

If the preceding SMART support status is Disabled, you must enable SMART support:

# Smartctl-s on/dev/sda
=== Start of enable/disable commands section ===
SMART Enabled.

Check the hard disk Status. if the following result is not PASSED, you need to immediately be alert and back up all data immediately, hard disk problems may occur at any time (but it is worth noting that even if the result is PASSED, it does not mean that the hard disk 100% is secure. PASS does not mean that there is no problem. if it is not PASSED, it means there must be a problem ):

# Smartctl-H/dev/sda
=== Start of read smart data section ====
SMART overall-health self-assessment test result: PASSED

Make a quick self-check:

# Smartctl-t short/dev/sda
=== Start of offline immediate and self-test section ====
Sending command: "Execute SMART Short self-test routine immediately in off-line mode ".
Drive command "Execute SMART Short self-test routine immediately in off-line mode" successful. Testing has begun.
Please wait 1 minutes for test to complete.
Test will complete after Thu Jul 22 22:51:00 2010
Use smartctl-X to abort test.

After executing the preceding self-check command, wait for a while. you can view the progress and result using the following command:

# Smartctl-l selftest/dev/sda
=== Start of read smart data section ====
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime (hours) LBA_of_first_error
#1 Short offline Completed without error 00% 20949-
#2 Short offline Completed without error 00% 20947-

If you want to perform long-term self-check (time-consuming, it is recommended to do so in the early hours ):

# Smartctl-t long/dev/sda

View error logs:

# Smartctl-l error/dev/sda
=== Start of read smart data section ====
SMART Error Log Version: 1 No Errors Logged

Configure smartmontools

Under CentOS/Fedora:

# Vi/etc/smartd. conf
#/Etc/init. d/smartd restart

Under Debian/Ubuntu:

# Vi/etc/default/smartmontools
# Vi/etc/smartd. conf
#/Etc/init. d/smartmontools restart

You can modify the configuration file of smartmontools to regularly perform health check on the hard disk, just as if you were given a regular health check, after a physical examination, it does not mean no disease (many devices with physical examination can not be found), so this is also in line with what Google's hard disk report said, only 60% of all broken hard disks can be S. m.A. r. t. detected (only 60% of all patients are detected during physical examination ).

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.