S. m. a. r. t. (self-monitoring, analysis, and reporting technology): This is a data security technology widely used by hard disks. When working on hard disks, the monitoring system analyzes the status of motors, circuits, disks, and heads, when an exception occurs, a warning will be issued, and some will automatically speed down and back up data. As early as the 1990s s, people realized that the value of data is better than that of hard disk itself. They were eager to have a technology that could predict hard disk faults and implement relatively safe data protection. m.a. r. T Technology came into being. At present, although the average MTBF of most hard disks has reached 30000 ~ More than 50000 hours, but for many users, especially commercial users, a common hard disk failure can cause catastrophic consequences. So today, S. m.a. r. t technology is still used by us.
What is s.m.a.r.t?
This technology was first developed by Compaq and was revised by hard drive manufacturers such as IBM, Seagate, Fujitsu, and kunteng,
It integrates the intelliisafe diagnostic technology of Compaq and the PFA detection technology features of IBM. On May 1995, Compaq submitted the intelliisafe technical standard report (SFF-8035i) to the small form factor (SFF) Committee; on January 1996, version 1.0 was revised (SFF-8035r2 ); in June 1996, the 1.3 version of the amendment (SFF-8055), and joint IBM and other companies to sff formal application to add the intelliisafe technology to the ATA-3 industry standards, officially renamed S. m.a. r. T as the industry standard, S. m.a. r. t specifies the standards that hard drive manufacturers should follow. S. m.a. r. t standard conditions mainly include: complete S. m.a. r. t parameter and attribute settings. s can be used normally on a specific system platform. m.a. r. t. Through BIOS detection, it can identify whether the device supports S. m.a. r. T and displays relevant information, and can identify valid and invalid S. m.a. r. t information; allows users to enable and disable S. m.a. r. t function; S. m.a. r. t, determine the device's working status, and issue corresponding correction instructions or warnings. S is supported in both hard disks and operating systems. m. a. r. t. if the technology is enabled by default, S. m. a. r. t. the technology can display an English warning message on the screen: "Warning: immediatly backup your data and replace your hard disk drive, a failure may be imminent. "(warning: An error may occur when backing up your data and changing the hard drive at the same time .)
Where is s.m.a.r.t? How to work?
S. m.a. r. t information is retained in the service area of the hard disk. This area is usually located at the top of the hard disk 0 physical surface, dozens of physical tracks, which are written by the manufacturer to the relevant internal management.Program. In addition to the S. m.a. R. T information table, it also includes low-level formatting procedures, encryption and decryption programs, self-monitoring programs, and automatic repair programs. The monitoring software uses a command named "smart return status" (commandCode(B0h). end users are not allowed to modify the information.
What is the composition of the S. m.a. R. T information table?
S. m.a. r. t standard uses binary code as S. m.a. r. t. m.a. r. t information table for normal detection and operation. The S. m.a. R. T command is divided into the master command and subcommands ). The master command mainly provides information about whether the device supports the features of S. M. A. R. T or a specified command. The command provides detection information that supports the S. m.a. R. T device. These commands are mainly written by the device manufacturer. Some professional hard drive maintenance software can detect the device through these codes.
View hard disk health status through software Summary
The principle of S. m.a. R. T technology is to detect hard disk attributes, such as data throughput performance, Motor Start Time,
Compare and analyze the attribute values and standard values such as the track retrieval error rate, infer the failure of the hard disk, and provide prompt information to help you avoid data loss. S. m.a. r. t thus specifies the special detection parameters, due to the hard disk structure, performance and positioning of different, in addition to the ATA-3 standard parameters, vendors can provide different s according to their product features. m.a. r. t detection parameters. Common users can use common system tools (such as aida32) to view and learn the "Health" Status of the hard disk through these parameters. Next, we take the S. m.a. R. T detection parameter of Seagate hard disk as an example to analyze the meanings of the main parameters. As shown in 2, S. m.a. R. T detection parameters are divided into seven columns: Id detection code, attribute description, threshold, attribute value, maximum error value, actual value, and attribute status.
Id detection code
The ID detection code is not unique. the manufacturer can use different ID codes as needed or increase or decrease the number of ID codes according to the number of detection parameters. For example, the product ID detection code of Western Digital Company is "04", and the detection parameter is start/stop count (power-on times ), however, the detection parameters in the same Code of Fujitsu are "Number of times the spindle motor is activated" (motor activation time ).
Attribute description
Attribute description is the name of the detection project. You can increase or decrease it by the vendor's customization. Due to the continuous updating of ATA standards, different types of products of the same brand may also be different. But make sure that S. m.a. r. t specifies several major inspection items (although different vendors have specific naming rules for inspection items, the essence of these monitoring items is actually the same ). Read error rate error read rate start/stop count start/stop times (also called power-on times) relocated sector count re-allocate sector count spin up retry count rotation retry times (that is, hard disk startup retry times) drive calibration retry count disk calibration retries ultra dma crc error rate (ultra DMA parity error rate) multi-zone error rate multi-region error rate vendor-specific vendor features note that, the attribute descriptions of different manufacturers and products are different. Users only need to understand the meaning of attribute monitoring values without having to have a deep understanding of their specific meanings.
Threshold
(Threshold) is also called the threshold value. It is a reliable attribute value specified by the hard disk manufacturer. It is calculated using a specific formula. If a property value falls below the corresponding threshold, it means that the hard disk will become unreliable and data stored in the hard disk will be easily lost. The composition and size of reliable attribute values vary with hard disks. Note that the ATA standard only specifies some S. m.a. r. t parameter, which does not specify a specific value. The value of "threshold" is determined by the Vendor Based on its own product features. Therefore, the testing software provided by the manufacturer may differ greatly from the testing results of the testing software (such as aida32) in windows. Here, we recommend that you use the manufacturer's software testing results as the standard, because in windows, the system requires that the hard disk boot program be much larger than that in DOS, which may lead to hard disk S. m.a. r. the T value is more fluctuating than that detected in the DOS environment. Taking raw errorrate (error read rate) as an example: the formula for calculating this parameter is 10 × log10 (number of data sectors transmitted between the host and the hard disk) × 512 × 8/number of rereded slices. "512x8" converts the number of sectors to the transmitted data bit (BITS). The value is only 10 ^ 10 ~ The value is calculated within the range of 10 ^ 12. When the Windows system is started and the data sector transmitted between the host and the hard disk is greater than or equal to 10 ^ 12, this value is reset again. This is why some values fluctuate greatly in different operating environments and different detection programs.
Attribute Value
Attribute Value refers to the maximum normal value preset when the hard disk is released, generally ranging from 1 ~ 253. Generally, the maximum value is 100 (applicable to IBM, kunteng, Fujitsu) or 253 (applicable to Samsung ). Of course, when there are exceptions, for example, some models of hard disks produced by Western data companies use two different attribute values. The attribute values are set to 200 at initial production, however, the hard disk attribute value was changed to 100.
Maximum error value
(Worst) the maximum error value is the maximum non-normal value that has occurred during hard disk operation. It is the calculated value for the accumulative running of the hard disk. According to the running cycle, the value is constantly refreshed and very close to the threshold. S. m.a. R. T is used to analyze and determine whether the hard disk is in normal state. It depends on the comparison result between this value and the threshold value. The maximum attribute value at the beginning of the new hard disk, but this value will decrease as you use it or encounter an error. Therefore, a large attribute value means that the hard disk is of good quality and high reliability, while a smaller attribute value means that the failure may increase.
Actual value
(Date) is the actual number of Hard Disk detection projects running, many projects are cumulative value. For example, in start/stop count (number of Start and Stop times) in figure 3, the actual cumulative value is 436, that is, the hard disk has been powered on and started for 436 times since the beginning.
Attribute status
(Status) This is S. m.a. r. t after comparing and analyzing the previous attribute values, the current status of each attribute on the hard disk is also an important information for us to intuitively judge the "healthy" Status of the hard disk. According to S. m.a. R. T, this status generally has three states: Normal, warning, and reporting of faults or errors. S. m.a. r. t determine the three States and S. m.a. r. the pre-failure/advisory bit values of T are closely related. If pre-failure/advisory bit = 0 and the value of the reliable attribute is greater than the threshold value, the "OK" flag is displayed normally. When pre-failure/advisory bitt = 0 and the reliability property value is greater than the threshold but close to the threshold value, the warning message "!" is displayed. Flag; when pre-failure/advisory bitt = 1 and the reliability attribute value is smaller than the threshold value, the system reports a fault or error and prompts "!" Flag. In Figure 2, we find that the "OK" mark is displayed in the normal state and there are two states: "value is normal" and "always skipped" (always passing. The difference between them is: "The value is normal" indicates this item S. m.a. r. the T value is normal, and the hard disk is not faulty. If the value is skipped forever, it indicates that this item is only a record of a parameter, and there are no criteria for conformity and rejection, such as "power on time count ", this parameter only records the time when the hard disk is powered on. This parameter should always be qualified. It does not need to be used to measure the performance of the hard disk, so it is displayed as "OK: value is normal ". The following uses the start/stop count (power-on times) detection parameter with ID "04" as an example to fully understand the meaning of these seven parameters: as shown in figure 2, the attribute value specified by this parameter is "100". The normal value is calculated using the formula: "100-number of power-on times during the normal life of the hard disk/1024. The maximum error value is the cumulative calculation value of the hard disk operation. For example, if it is a new hard disk, the number of power-on is 0, so it is 100-0/1024 = 100, the maximum error value = the attribute normal value. The maximum error value changes as the number of power-on times increases. The threshold set by the manufacturer is 20, that is, when the number of power-on switches on the hard disk reaches 81920 (100-81920/1024 = 20), the maximum error value is the threshold, the system prompts you to back up data. Because of this, the number of power-on requests is within the range of 81920 and the maximum error value is always greater than the threshold of 20. The number of power-on times (actual date value) in the figure is 107, so the maximum error value is approximately 100, and the status is "OK: value is normal )". Note that the value given by each parameter is given by some specific calculation formula. As a user, you only need to observe the relationship between "worst" and "threshold" values, and pay attention to the attribute status information to get a general picture of the hard disk health.
S. M. r.a. T Technology in SCSI Systems
As there are two standards in the hard drive field: ATA and SCSI, it is undeniable that S. m.a. r. t technology supports the two series of products at the same time, but there are some differences in some parameter settings, in the key parameter SCSI is more complex than ATA hard disk. However, in actual operation, S. m.a. r. T has more intervention on the ATA/ide system than the SCSI system, while the SCSI Fault Determination is more professional and accurate. The S. m.a. R. T Technology of the SCSI hard disk is more complex than the S. m.a. R. T Technology of the ATA hard disk. The following only lists some of the parameters specific to the SCSI hard disk. Primary temp: the operating temperature of the hard disk. Secondary temp: the operating temperature around the PCB. Min and Max temp: the maximum and minimum operating temperature of the hard disk within a period of time. velocity observer count: the number of times that the Servo Track deviated from the specified track within a period of time 12 V: 12 V power supply voltage value 5 V: 5 V power supply voltage value Mr Res: MR head resistance value Sectors Read: number of Sectors Read from the hard disk within a period of time sectors written: the number of sectors written to the hard disk within a period of time in the ATA/IDE environment, by the software on the host to S. m.a. r. t the alarm signals generated by the "Report status" command from the hard disk are interpreted. The host queries the hard disk to check the status of this command. If it shows that a fault is imminent, it sends the alarm signal to the end user or system administrator. The system administrator schedules the shutdown time to back up data and replace the hard disk. In addition to evaluating the "Report status" command from the hard disk, the main system can also evaluate the attributes and alarm reports. In a SCSI environment, S. M. A. R. T only reports "in good condition" or "in fault ". The fault is determined by the hard disk, and then the host notifies the user to take measures. There is a detection bit in the SCSI standard. When the hard disk determines the reliability is faulty, the detection bit is marked and the end user or system administrator is notified to take corresponding measures.
Prediction Result of S. m.a. R. T
Hard drive failures are usually divided into two categories: unpredictable and predictable. Unpredictable faults usually refer to unpredictable electronic and mechanical faults. Such faults occur in an instant. If an unexpected collision occurs during hard disk power-on, the hard disk head will impact the disk, or chip or circuit failure caused by excessive Instantaneous Current. Generally, the hard disk is no longer working until S. m.a. R. T reflects a decline in performance. However, this can only reduce the rate of unpredictable failures through improvements in quality, design, technology, manufacturing, and standard operations during use (such as the development and progress of Hard Disk shockproof technology, effectively reduces the probability of physical failure caused by hard disk vibration ). Predictable faults have the characteristics that the corresponding parameters change over time before the hard disk is completely unavailable. Based on this feature, the real-time information detection technology, such as S. m.a. R. T, can be used to monitor its attributes for fault prediction, analysis, and recommendations for prevention. Such faults include software faults and hardware faults. For example, many mechanical faults are regarded as typical predictable faults. m.a. r. t technology is useful for such faults. Before a fault occurs, you can send a notification to remind users to back up data to protect users' data. According to research data, among the hard drive failures that can be predicted using the S. m.a.r. t technology, 60% is mechanical, and about 40% is an effective prediction of soft faults. With the maturity of S. m.a. R. T and related technologies, more and more types of predictable faults will be generated, and preventive measures for faults will become more effective. Of course, readers who do not want to enable the S. m.a. R. T technology can also disable it in the "advanced BIOS set up" option set in BIOS.
Answers to special questions
1. After raid is set up, Will S. m.a. R. T still take effect? After the raid is created for the user group, the S. M. A. R. T function is still effective, but the RAID card control chip must support the S. M. A. R. T function. In fact, the alarm function of the RAID card S. m.a. R. T is similar to the error message of the hard disk in the normal state. When an alarm is triggered, the hard drive indicator lights (usually red) corresponding to the corresponding module are highlighted for warning. 2. Why can't I monitor the S. m.a. R. T status of the USB interface external hard drive? For an external hard disk with a USB interface, the system determines it as a USB device and the information of S. m.a. R. t cannot be monitored because this is not specified in the USB standard. At this time, although the hard disk itself is still recording the S. m.a. T status, but because it is a USB peripheral, the system will not monitor its S. m.a. R. T status. Does the 3. S. m.a. R. T function affect system performance? Hard Disk record S. m.a. r. t information can be collected in two ways. The first method is online (on-line, according to the information collected by the actual working status of the hard disk, the hard disk updates its own s in real time or within a specified period of time. m.a. r. t data. For example, if an ATA Hard Disk encounters an unmodifiable error when writing data to a sector, the hard disk will update the information to the Smart Data in time. For a SCSI hard disk, if it is set to S. m.a. r. t the new cycle is 4 minutes, then it will collect the related s within 4 minutes. m.a. r. t information is updated to S. m.a. r. t data zone, and then start tracking for the next cycle. The online collection status does not affect the system performance. The second type is "off-line" collection. Offline collection is performed when the hard disk receives some specific commands from the host, in this case, the hard disk is in the "idel" status or error correction status. In this case, the hard disk itself performs a lot of operations to test the health status, resulting in delay in the normal requirements of the host. Therefore, the offline collection status may degrade the system performance. 4. Does the s.m.a.r.t technology record relevant information cyclically? For a SCSI hard disk, S. m.a.r.t information is recorded cyclically. Generally, the period is 4 minutes ~ Within 120 minutes. This value is set when the hard disk is released, and can only be modified through professional software. For ATA hard disks, the records of S. m.a. R. T information are not cyclical.
Summary
after nearly nine years of development, S. m.a. R. T technology has become an indispensable part of ATA/SCSI specifications. At present, the Development and Research of Data Protection Technology by hard drive manufacturers are also based on S. m.a. R. T technology. Through the analysis in this article, we can see that S. M. A. R. T has passive detection and early warning functions, and is derived from the emerging data protection technology to provide active repair capabilities. With the update of hard drive technology, we have reason to believe that S. m.a. R. T technology will provide more protection for user data.