Part of my $dayjob as a sysadmin is to monitor all things.
I’ll be publishing my home-made nagios checks on github in the near future.
Here is the first one that uses the Web API of a DDN’s SFA12K (might work on the 10k too, haven’t tried) which is a storage platform.
The URL to the check is located here: https://github.com/martbhell/nagios-checks/tree/master/plugins/check_ddn
Unfortunately it seems that the Python Egg (the library / API bindings) is still not available online so one has to ask DDN Support to get that.
It’s not perfect, there’s much room for improvement, refactoring, moving the password/username out of a variable and it makes many assumptions.
But making it work for you shouldn’t be too hard. If you have any questions comment here or on github :)
You are monitoring the SMART values of your disks right? They’re usually a real good indicator of the health of the drive.
Thought I’d check out the SMART value of the disks in my desktop today (while checking if I had notifications from smartd on).
Low and behold, the Load_Cycle_Count (LLC) was really high, much higher than power_cycle_count on the 3TB WD disk I have. It turns out this is quite an old problem so there are a few posts about this on the Internets.
The Interwebs says max in the specs are 300k load cycles. Smartctl -a says I’m already at 218602 after 9302 power on hours (387 days but I power off the computer at night).
Model Family: Western Digital Caviar Green (AF, SATA 6Gb/s) Device Model: WDC WD30EZRX-00DC0B0
For Windows there’s a wdidle3.exe that is a DOS program that one can put on a bootable floppy (…) and boot a computer on to change some stuff on a disk.
Fortunately I run Linux (Ubuntu 14.10 since yesterday) and there’s a tool called idl3ctl – one can grab it from here: http://idle3-tools.sourceforge.net/
I got the latest source code and compiled it myself because there had been some updatesread on