Monitoring ESXi servers health is the keyword to keep the virtual infrastructure fully working and the servers status under control.
As monitoring systems, Nagios is the solution I mostly use for the networks I manage. To check VMware ESXi 4.x/5.0 servers, there is a great plugin called check_esxi_hardware.py written by Claudio Kuenzler mentioned also in the VMware community.
Information reported by the plugin are the same as shown in the vSphere Client navigating to Configuration tab –> Health Status.
Prerequisites
To use the plugin, Nagios server requires the following components installed:
Procedure
In Nagios server, install Python using the command yum.
# yum install python
Using wget command, download the Python pywbem extension in the system.
# wget http://downloads.sourceforge.net/project/pywbem/pywbem/pywbem-0.7/
pywbem-0.7.0.tar.gz?r=http%3A%2F%2Fsourceforge.net%2Fprojects%2Fpywbem%2
Ffiles%2F&ts=1332321760&use_mirror=freefr
Unpack the downloaded file using the tar command.
# tar -vxzf pywbem-0.7.0.tar.gz
Install the pywbem extension running from the console the command setup.py.
# cd pywbem-0.7.0
# python setup.py install
Download the check_esxi_hardware.py plugin and copy the file in the directory /usr/lib/nagios/plugins.
# wget http://www.claudiokuenzler.com/nagios-plugins/check_esxi_hardware.py
# cp check_esxi_hardware.py /usr/lib/nagios/plugins/
When the plugin has been copied, make the file check_esxi_hardware.py executable.
# chmod 755 check_esxi_hardware.py
The correct syntax to check the ESXi server is the following:
./check_esxi_hardware.py -H IP_address_esxi -U username -P password -V vendor
Where the username must be created in the host ESXi and member of the root group. Since the use of the root user is not recommended for security reasons, create a dedicated account with vSphere Client.
Testing the plugin
When installation completes, the plugin should be tested to check the correct functionality. To test an HP server health status for instance, from the console type the command:
# ./check_esxi_hardware.py -H esxi1 -U username -P password -V hp
If everything works as expected, you should receive a message similar as reported in the picture above.
To make the hardware check automatic, the correct command must be defined in Nagios.
define command { command_name check_esxi_hardware command_line $USER1$/check_esxi_hardware.py -H $HOSTADDRESS$ -U $ARG1$ –P $ARG2$ -V $ARG3$ }
The monitoring system is now able to to display the hardware health status of configured servers in the network. For additional configuration check the plugin author website.
When the hardware status of ESXi servers is properly monitored, time requested to identify a problem is lower minimizing the risk of a service interruption.