Replacing a failed drive in a NetApp SAN should follow a strict hardware and data-protection workflow to avoid aggregate risk and unnecessary performance impact.
Before replacement
- Confirm exact failed disk identity (shelf, bay, serial)
- Check aggregate/RAID protection state
- Verify no additional disks are degraded or predicting failure
- Ensure support-approved replacement part is available
1) Validate failure in ONTAP
storage disk show -broken
storage disk show -container-type broken
system health alert show
2) Identify physical slot
storage shelf location-led modify -shelf <shelf-id> -bay <bay-id> -state on
Use LEDs to avoid removing the wrong drive.
3) Replace disk physically
- Follow ESD-safe handling
- Remove failed drive only
- Insert replacement and confirm seated properly
4) Confirm disk recognition and reconstruction
storage disk show
storage aggregate show-status -aggregate <aggr-name>
Monitor RAID reconstruction and avoid additional risky maintenance until complete.
5) Validate final health
system health status show
event log show -time >1h
- No active hardware/RAID alerts
- Aggregate returns to protected state