We’ve written several articles on performance improvements by changing your hard drive to an SSD model. We certainly advocate for doing that at the earliest opportunity. In general however, how do you know when to replace your hard drive? Read on for the answer to this question.
When Failure is Predicted
Windows 10 build 20226 or later
Settings->Storage->Manage Disks and Volumes
Click on your drive and then Properties
On the following screen you’ll see a drive health section. It will show whether your drive is healthy or in need of attention. Please note that healthy drives can fail at any time, and so while this section is a good guideline, its not a catch all and will never replace having a good backup.
Older versions of windows
Open a command prompt as administrator
Type the following and hit enter when done
wmic /namespace:\\root\wmi path MSStorageDriver_FailurePredictStatus
If the resulting PredictFailure column says anything but FALSE, you should consider replacing the drive as soon as possible.
Linux has some amazing tools to check the health of your hard drive via the command line. Firstly, install smartmon tools:
$ sudo apt-get update && sudo apt-get install smartmontools
After installation execute the next command. Please note that <device> is the corresponding hard drive in your system. The first drive is typically /dev/sda, but it can be different depending on your distribution.
$ sudo smartctl -a /dev/<device>
To get a list of the hard drive devices in your system use:
$ sudo lsblk -all | grep disk
Once you’ve executed the smartctl command, you should see the Reallocated_Sector_Count line (or Reallocated_Event_Count). This is an indicator of bad sectors, or the number of sectors that cannot be read from/written to that have had to be reallocated. A value greater than zero indicates that there is space on your drive that is unusable, and replacement should be considered if this number continues to increase.
When storage density is a concern
Consolidation and rackspace density is the goal here. If many small drives can be replaced by one larger drive, it often makes sense to swap them. This can help with server/desktop, NAS, and SAN drive bay space as well as with energy efficiency. When less drive bays are needed, less rack space is needed and less overall equipment costs result.
In addition, the probability of failure of a drive reduces when drives are removed from the system. Take 10 drives for example with a 1% failure rate per year. (healthy drive rate is 0.99 or 99% and failure is 0.01 or 1%)
(0.99 healthy)^(10 drives) = 0.9044 or just over 90% probability that all drives will remain healthy over the year.
Contrast that with the scenario where only 5 drives are in use:
(0.99 healthy)^(5 drives) = 0.951 or just over 95% probability that all drives will remain healthy over the year.
What does this mean? The more drives you have the higher chance that one will fail in a given time period. Please note: in the 10 drive scenario, we are not saying 1 drive is guaranteed to fail in that year. That would be an observed failure rate. What we are saying is the chance of failure occurring is approximately 10%.
We know that there are good reasons to have many drives – redundancy, storage and performance are the biggest of those. But when cost is a factor, a lower overall count of drives will be more affordable to maintain in both rack space and equipment drive bay requirements as well as drive replacement costs, and electricity. Go green with your equipment and the next time you are shopping for hard drives consider larger and initially more expensive drives over many smaller, cheaper ones.
Overall Failure Replacement Guidelines
Hard drives have finite lifespans. While they are very reliable in the right environments, many factors will affect their lifespans. Things such as:
- Cumulative hours of Use
- Power on/Off Cycles
- Read/Write Cycles
- Sustained Heat
- Power and quality of power
- Type of drive (Server/NAS/Desktop/Laptop)
All these and more will affect the life of a drive.
For more information, Backblaze publishes a quarterly report on drive reliability here. Its an interesting read, so be sure to check it out.
Stats of note from Backblaze Q2 2021 report
SSD’s were 1/2 as likely to fail as mechanical drives given same use patterns, hours etc.
AFR or annualized failure rate is between 1-6% depending on the age of the drives. It stands to reason that as drives get older, the AFR increases.
In 2013 Backblaze published an article that discussed the median age of hard drives in their data centers. The conclusion was that drives fail immediately or very quickly because of defects. If they don’t fail immediately they typically start to fail around the 3-4 year mark due to wear. It was estimated that around the 6 year mark is the median drive lifespan.
We can expect that drives will last longer today. Another point to note, SSD technology has improved since 2013.
If the drive doesn’t immediately give you problems, you can reasonably expect it to provide you with 4 years of moderate to high use. After that, the failure rates increase to a point where it makes sense to proactively replace the drive.
We advocate for replacement of mechanical hard drives as soon as possible due to the performance and reliability benefits, but if you already have an SSD, and your wear pattern is typical, 4 years or older is, in our opinion a reasonable replacement timeline.
The second significant reason to replace drives is when storage density improves, so that you don’t need as many drives to store the data you have. Not only will this help save electricity costs, it will also reduce the equipment real-estate requirements in your data center and servers.
As time moves on and drive reliability improves even further, it is conceivable within the next 4-6 years, the hard drive will be expected to last the usable lifetime of the computer it’s installed in.
Check out more of our How-To’s for additional great tips like this one.