Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
check_commands:check_esxi_hardware [2019/10/24 14:11]
Robbie Ferguson
check_commands:check_esxi_hardware [2019/10/24 14:21] (current)
Robbie Ferguson
Line 171: Line 171:
 --- ---
  
-Q: I have several ESXi hosts behind the same IP (NAT). How can I use the check_esxi_hardware?​+**Q:** I have several ESXi hosts behind the same IP (NAT). How can I use the check_esxi_hardware?​
    
- A: Since version 20160531 it is possible to manually define the CIM port (which defaults to 5989). So if you set up port forwarding (DNAT) you can now monitor all ESXi servers behind the same NAT-address. The parameter you want in this case is "​-C"​ (or --cimport). +**A:** Since version 20160531 it is possible to manually define the CIM port (which defaults to 5989). So if you set up port forwarding (DNAT) you can now monitor all ESXi servers behind the same NAT-address. The parameter you want in this case is "​-C"​ (or --cimport). 
-  + 
-  +--- 
---- --- --- --- --- --- --- --- --- --- --- --- --- --- + 
-  +**Q:** Is the plugin compatible with ESXi 6.x? 
-Q: Is the plugin compatible with ESXi 6.x? + 
-  +**A:** Yes. Please note that starting with ESXi 6.5 you might have to enable the CIM/WBEM services first, as they are disabled by default. Refer to [[https://​kb.vmware.com/​s/​article/​2148910|https://​kb.vmware.com/​s/​article/​2148910]]
- A: Yes. Please note that starting with ESXi 6.5 you might have to enable the CIM/WBEM services first, as they are disabled by default. Refer to <a href="https://​kb.vmware.com/​s/​article/​2148910" target="​_blank">​https://​kb.vmware.com/​s/​article/​2148910</a>+ 
-  +{{ :​check_commands:​308-cim-server-service.png?​direct&​400 |}} 
-  <​div class="​thumbnail"><​a href="/​graph/​news/​308-cim-server-service.png" rel="​lightbox"><​img src="/​graph/​news/​308-cim-server-service_small.png"​ alt="​sfcbd-watchdog cim service"​ /></​a></​div> ​ + 
-  +--- 
---- --- --- --- --- --- --- --- --- --- --- --- --- --- + 
-  +**Q:** I can't execute the plugin and get the following error message. Permissions are correct however (e.g. 755). 
-Q: I can't execute the plugin and get the following error message. Permissions are correct however (e.g. 755). + 
-  +<code>​execvpe(/​usr/​lib64/​nagios/​plugins/​check_esxi_hardware.py) failed: Permission denied</​code>​ 
-  <p class="​consoletext"​> execvpe(/​usr/​lib64/​nagios/​plugins/​check_esxi_hardware.py) failed: Permission denied + 
-  +**A:** This error comes from SELinux. You need to write an allow rule for it. 
-A: This error comes from SELinux. You need to write an allow rule for it. + 
-  +--- 
---- --- --- --- --- --- --- --- --- --- --- --- --- --- + 
-  +**Q:** The plugin reports the following problem with memory, but no memory hardware issues can be found on the server: 
-Q: The plugin reports the following problem with memory, but no memory hardware issues can be found on the server: + 
-  +<code>​CRITICAL : Memory - Server: HP ProLiant DL380p Gen8 s/n....</​code>​ 
-  <p class="​consoletext"​>​CRITICAL : Memory - Server: HP ProLiant DL380p Gen8 s/n.... + 
-  +**A:** It is possible that an alert needs to be cleared in the servers IPMI log first. To do that, you need to login into your ESXi server with SSH and run the following commands: 
-A: It is possible that an alert needs to be cleared in the servers IPMI log first. To do that, you need to login into your ESXi server with SSH and run the following commands: + 
-  +<code>​localcli hardware ipmi sel clear 
-  <p class="​consoletext">​esxiserver ~ # <span class="​consolecommand"​>​localcli hardware ipmi sel clearesxiserver ~ # <span class="​consolecommand">​/​sbin/​services.sh restart ​+/​sbin/​services.sh restart</​code>​ 
 This might affect other CIM entries as well. So it's a wise idea to clear the IPMI system event log (sel) first before investigating further. This might affect other CIM entries as well. So it's a wise idea to clear the IPMI system event log (sel) first before investigating further.
-  
---- --- --- --- --- --- --- --- --- --- --- --- --- --- 
-  
-Q: Certain hardware elements show incorrect health/​operational states, e.g. "​Cooling Unit 1 Fans": ​ 
-  <p class="​consoletext">​20190205 00:​26:​26Element Name = Cooling Unit 1 Fans20190205 00:​26:​26Element HealthState = 1020190205 00:26:26 Global exit set to WARNING 
  
-A: Certain server models might show false hardware alarms when these particular hardware elements were disabled in BIOS, are idle or have disabled sensors. From the <a href="https://​support.hpe.com/​hpsc/​doc/​public/​display?​docId=emr_na-a00053955en_us" target="​_blank"​ title="">​HP FAQ</a>:  +--- 
-  <p class="​quote"​>PR 2157501: You might see false hardware health alarms due to disabled or idle Intelligent Platform Management Interface (IPMI) sensors. Disabled IPMI sensors, or sensors that do not report any data, might generate false hardware health alarms.  + 
-In this case it makes sense to ignore these elements using the -i parameter.  +**Q:** Certain hardware elements show incorrect health/​operational states, e.g. "​Cooling Unit 1 Fans":​ 
---- --- --- --- --- --- --- --- --- --- --- --- --- --- + 
-  +<​code>​20190205 00:26:26 
-Q: The check_esxi_hardware plugin is not working (anymore) since ESXi 6.7 U2/U3 on DELL servers.  +Element Name = Cooling Unit 1 Fans 
-A: The issue seems to be the "OpenMange" VIB. This can be verified by checking the list of installed VIB's on an ESXi server:  +20190205 00:26:26 
-  <p class="​consoletext">​esxiserver ~ # <span class="​consolecommand"​>esxcli software vib listNameVersionVendor[...]OpenManage9.3.0.ESXi670-3465Dell[...] ​ +Element HealthState = 1020190205 00:26:26 
-After uninstalling the OpenManage VIB, the plugin works again. According to DELL, ESXi 6.7 U2 is <a title=""​ target="​_blank"​ href="https://​www.dell.com/​support/​article/​ch/​de/​chdhs1/​sln311238/​openmanage-integration-for-vmware-vcenter?​lang=en">not yet officially supported</​a> ​(as of July 2019) by OpenManage:  +Global exit set to WARNING</​code>​ 
-  <p class="​quote"​>​OpenManage Integration for VMware vCenter v4.3.1 (Initial 4.3 Download) (4.3.1 Release Notes) (4.3 Manuals)Does not add official 6.7 U2 support (support for 6.7 U2 will come in the fall with the next major release)  + 
-See also official ​<a href="https://​kb.vmware.com/​s/​article/​74696" target="​_blank"​ title="">​VMware KB 74696</​a> ​entry for this.  +**A:** Certain server models might show false hardware alarms when these particular hardware elements were disabled in BIOS, are idle or have disabled sensors. From the [[https://​support.hpe.com/​hpsc/​doc/​public/​display?​docId=emr_na-a00053955en_us|HP FAQ]]
-Update October 15th 2019: OMSA 9.3.1 fixes this issue. ​ + 
---- --- --- --- --- --- --- --- --- --- --- --- --- --- +<WRAP center round box 80%>PR 2157501: You might see false hardware health alarms due to disabled or idle Intelligent Platform Management Interface (IPMI) sensors. Disabled IPMI sensors, or sensors that do not report any data, might generate false hardware health alarms.</​WRAP>​ 
-  + 
-Q: I am using Icinga 2 and getting the following error message in the check output:  +In this case it makes sense to ignore these elements using the -i parameter. 
-  <div class="​thumbnail"><​img src="/​graph/​news/​308-killed.png"​ alt="​check_esxi_hardware Icinga 2 killed"​ /> </​div>​  + 
-  <p class="​consoletext"><​timeout exceeded.=""><​terminated by=""​ signal=""​ 9=""​ (killed).=""​ /></​timeout>​  +--- 
- A: This timeout comes from Icinga 2 itself and means that the plugin'​s process was killed during its runtime. You should increase the timeout of the Service object or of the <a title=""​ target="​_blank"​ href="​https://​icinga.com/​docs/​icinga2/​latest/​doc/​09-object-types/#​checkcommand">​CheckCommand object</​a>​. The default is 1 minute, some servers with a lot of CIM sensors might need longer to respond.  + 
- +**Q:** The //check_esxi_hardware// plugin is not working (anymore) since ESXi 6.7 U2/U3 on DELL servers. 
 + 
 +**A:** The issue seems to be the "OpenManage" VIB. This can be verified by checking the list of installed VIB's on an ESXi server: 
 + 
 +<code>esxcli software vib list</​code>​ 
 + 
 +After uninstalling the OpenManage VIB, the plugin works again. According to DELL, ESXi 6.7 U2 is [[https://​www.dell.com/​support/​article/​ch/​de/​chdhs1/​sln311238/​openmanage-integration-for-vmware-vcenter?​lang=en|not yet officially supported]] (as of July 2019) by OpenManage:​ 
 + 
 +<WRAP center round box 80%>​OpenManage Integration for VMware vCenter v4.3.1 (Initial 4.3 Download) (4.3.1 Release Notes) (4.3 Manuals)Does not add official 6.7 U2 support (support for 6.7 U2 will come in the fall with the next major release)</​WRAP>​ 
 + 
 +See also official ​[[https://​kb.vmware.com/​s/​article/​74696|VMware KB 74696]] entry for this.  
 + 
 +Update October 15th 2019: OMSA 9.3.1 fixes this issue.
  • check_commands/check_esxi_hardware.txt
  • Last modified: 2019/10/24 14:21
  • by Robbie Ferguson