Monitoring UPS Status with NUT and Xymon
The instructions and scripts that follow are intended to be used to monitor UPS readings - AC input/output voltages, Battery voltage, UPS load %, Battery charge %, and UPS status from a NUT (Network UPS tools) server and report the readings back to a Xymon monitoring server for logging, graphing and alerting.
Here you will find the most current versions of our:
- Instructions on integrating, monitoring, graphing and alerting on the output from the several scripts below with a Xymon server
- xymon_nut_scripts.tgz - Xymon/NUT scripts to monitor and alert on UPS readings
We have also written several other scripts that you may find useful. You may find them HERE
Basic NUT configuration
The full documentation for installing and configuring NUT (Network UPS tools) to monitor your UPS may be found on the official NUT website HERE so what follows are some basic instructions on configuring an already installed and functioning NUT server to allow our Xymon scripts to query the UPS readings from the NUT server.
There are many types and models of UPSes, so obviously your configurations may differ. The following configurations and scripts were specifically written to monitor an APC 1000VA Smart-UPS.
Not all UPSes provide all the readings that this model does, and some may provide more. Some parts of the scripts may need to be modified to get the required reading(s) from the UPS, and some UPSes might not be compatible with these scripts at all. YMMV.
In our scripts we will be using the upsc program that is part of the NUT suite to query the UPS readings from the upsd NUT server daemon.
Nut Config Files
First, we start with the ups.conf file. This file is where you configure all the UPSes that this system will be monitoring directly. These are usually attached to serial ports, but USB devices and SNMP devices are also supported.
The only thing we need to know from this config file is the name of the UPS (in brackets). The rest may be different depending on your UPS, and will have no impact on the scripts.
--[snip]-- # UPS definition from ups.conf [apc1000] driver = apcsmart port = /dev/ttyUSB0 cable = 940-0095B desc = "APC 1000 UPS"
From that UPS definition in ups.conf we can see that our UPS's name is apc1000. This is the name we will us in the Xymon scripts below.
Next, we need to modify the upsd.conf file. This file contains access control data, and by default it allows access to the UPS readings only from localhost.
Since we will be using the FQDN of the NUT server in our scripts for readability, and upsd.conf only allows access from localhost by default, our attempts to connect to the NUT server daemon on its ethernet IP address will be refused.
Knowing this, we will need to make the following modifications to this file:
This is the original (default) which allows access to the UPS reading only from the localhost.
ACL all 0.0.0.0/0 ACL localhost 127.0.0.1/32 ACCEPT localhost REJECT all
Here is our modified version of upsd.conf:
ACL all 0.0.0.0/0 ACL localhost 127.0.0.1/32 ACL xymonserver 192.168.1.1/32 ACCEPT localhost xymonserver REJECT all
We have added an ACL called "xymonserver" defined as the IP address of our Xymon server - use the real IP address of your server here of course.
And finally we have added the "xymonserver" ACL to the end of the ACCEPT line.
These changes will allow us to to use NUT's upsc program to communicate with the upsd NUT server daemon on the server's ethernet IP address.
Restart the upsd daemon:
- Now test that our upsd.conf ACL allows us access using upsc command like so:
$ /usr/bin/upsc email@example.com battery.alarm.threshold: 0 battery.charge: 100.0 battery.charge.restart: 00 battery.date: MM/DD/YY battery.packs: 000 battery.runtime: 3060 battery.runtime.low: 120 battery.voltage: 28.01 battery.voltage.nominal: 024 driver.name: apcsmart driver.parameter.cable: 940-0095B driver.parameter.pollinterval: 2 driver.parameter.port: /dev/ttyUSB0 driver.version: 2.2.2 driver.version.internal: 1.99.8 input.frequency: 60.25 input.quality: FF input.sensitivity: L input.transfer.high: 132 input.transfer.low: 103 input.transfer.reason: S input.voltage: 122.2 input.voltage.maximum: 122.8 input.voltage.minimum: 122.2 output.voltage: 122.2 output.voltage.nominal: 115 ups.delay.shutdown: 020 ups.delay.start: 000 ups.firmware: 60.8.D ups.id: UPS_IDEN ups.load: 037.9 ups.mfr: APC ups.mfr.date: MM/DD/YY ups.model: SMART-UPS 1000 ups.serial: xxxxxxxxxxxxx ups.status: OL ups.temperature: 036.4 ups.test.interval: 1209600 ups.test.result: NO
Excellent! This APC UPS seems to report to NUT quite a bit about itself. The statuses that our scripts will monitor have been bolded. Now we are ready to start working with the Xymon configuration and scripts.
Install the custom external scripts
- Copy the tgz file below into the ~xymon/server/ext directory.
- Extract the five bash shell scripts
tar xvzf xymon_nut_scripts.tgz
- Set the ownership and execution permissions on the scripts
chown xymon:xymon ~xymon/server/ext/xymon_nut_*.sh chmod +x ~xymon/server/ext/xymon_nut_*.sh
Edit each of the scripts to match your environment
- The scripts are pretty well documented, but you do need to modify a few of the pre-configured variables to get started:
- upsc - The location of the NUT upsc binary
- nutserver - The IP address or FQDN of the NUT server (host.example.com in our case)
- upslist - A space-separated list of UPSes connected to the NUT server
- yellowtest & redtest - The yellow and red threshold values for each of the tests. In the case of the xymon_ups-status.sh script, these are strings to test for, not numeric thresholds.
Tell Xymon (hobbitlaunch) to start running the new scripts
- Tell Xymon to start running the new external scripts by adding these lines to ~xymon/server/etc/hobbitlaunch.cfg:
[involtage] ENVFILE /usr/local/xymon/server/etc/hobbitserver.cfg CMD $BBHOME/ext/xymon_nut_in-voltage.sh INTERVAL 5m [outvoltage] ENVFILE /usr/local/xymon/server/etc/hobbitserver.cfg CMD $BBHOME/ext/xymon_nut_out-voltage.sh INTERVAL 5m [batcharge] ENVFILE /usr/local/xymon/server/etc/hobbitserver.cfg CMD $BBHOME/ext/xymon_nut_bat-charge.sh INTERVAL 5m [batvoltage] ENVFILE /usr/local/xymon/server/etc/hobbitserver.cfg CMD $BBHOME/ext/xymon_nut_bat-voltage.sh INTERVAL 5m [upsload] ENVFILE /usr/local/xymon/server/etc/hobbitserver.cfg CMD $BBHOME/ext/xymon_nut_ups-load.sh INTERVAL 5m [upsstatus] ENVFILE /usr/local/xymon/server/etc/hobbitserver.cfg CMD $BBHOME/ext/xymon_nut_ups-status.sh INTERVAL 5m
- Wait a few minutes and you should see new columns called in-voltage, out-voltage, bat-charge, bat-voltage, ups-load & ups-status on your Xymon page.
- Click on each of the icons and you should see the new reading(s) with a timestamp of the latest update similar to the image below:
Let's get to the graphing!
- It is a two step process to get Xymon to begin graphing our data.
- First we need to tell Xymon to start putting our new UPS readings into RRD (Round Robin Database) files
- Next we need to tell Xymon how we want our graphs to look, and what info we would like printed on them
- Edit ~xymon/server/etc/hobbitserver.cfg.
- Find the TEST2RRD line and add our new tests to the end like so:
That tells Xymon to map our new columns called in-voltage, out-voltage, bat-charge, bat-voltage & ups-load to rrd files, and that Xymon should send the data for these tests through the built-in NCV (name-colon-value) module.
- Tell Xymon to also include our new UPS graphs on the trends page by adding them to the GRAPHS definition line in the hobbitserver.cfg file like so:
- Then add the following lines to the end of your hobbitserver.cfg file.
NCV_in-voltage="*:GAUGE" NCV_out-voltage="*:GAUGE" NCV_bat-charge="*:GAUGE" NCV_bat-voltage="*:GAUGE" NCV_ups-load="*:GAUGE"
That tells Xymon to create a series of RRD files, one for each NCV pair reported by each of the the xymon_nut_*.sh scripts - Except the xymon_nut_ups-status.sh script since that test does not test for, nor report back numeric values.
The data sets will be of type GAUGE since all of our UPS readings go up and down and are not always-increasing counter type readings.
The RRD files that Xymon creates will be called ~xymon/data/rrd/host.example.com/column-name.rrd where each column-name will be one of the tests in our five scripts.
Xymon needs to be restarted to pick up the new configurations changes/additions. Alternately, you may just kill any running hobbitd_rrd and hobbitd_channel processes and Xymon will restart them using the new settings from hobbitserver.cfg.
- Wait a few minutes and then you should verify that the five new rrd files exist:
$ ls -l ~xymon/data/rrd/host.example.com/*.rrd -rw-r--r-- 1 xymon xymon 19640 Jul 13 16:57 in-voltage.rrd -rw-r--r-- 1 xymon xymon 19640 Jul 13 16:54 out-voltage.rrd -rw-r--r-- 1 xymon xymon 19640 Jul 13 16:51 bat-charge.rrd -rw-r--r-- 1 xymon xymon 19640 Jul 13 16:51 bat-voltage.rrd -rw-r--r-- 1 xymon xymon 19640 Jul 13 16:56 ups-load.rrd
- And then we can also verify that the expected data is in these files. We should verify three values: the name <name>, type of test <type>, and last value received <last_ds>:
$ rrdtool dump ~xymon/data/rrd/host.example.com/in-voltage.rrd | grep "name\|type\|last_ds" <name> apc1000 </name> <type> GAUGE </type> <last_ds> 121.5 </last_ds>
So far, So good. You may check the other rrd files if you would like to confirm that they are OK too.
Now if you view each of the new UPS status pages, you will notice that there are still no graphs being drawn. That is OK, because we haven't configured Xymon's hobbitgraph.cfg file to define how Xymon is supposed to graph our new columns yet.
- Edit ~xymon/server/etc/hobbitgraph.cfg and add the following lines: (see explanation below)
[in-voltage] TITLE Input AC Voltage YAXIS VAC DEF:apc1000=in-voltage.rrd:apc1000:AVERAGE LINE1.5:apc1000#00CCCC:apc1000 COMMENT:\n GPRINT:apc1000:LAST: apc1000 \: %5.1lf%s (cur) GPRINT:apc1000:MAX: \: %5.1lf%s (max) GPRINT:apc1000:MIN: \: %5.1lf%s (min) GPRINT:apc1000:AVERAGE: \: %5.1lf%s (avg)\n [out-voltage] TITLE Output AC Voltage YAXIS VAC DEF:apc1000=out-voltage.rrd:apc1000:AVERAGE LINE1.5:apc1000#00CCCC:apc1000 COMMENT:\n GPRINT:apc1000:LAST: apc1000 \: %5.1lf%s (cur) GPRINT:apc1000:MAX: \: %5.1lf%s (max) GPRINT:apc1000:MIN: \: %5.1lf%s (min) GPRINT:apc1000:AVERAGE: \: %5.1lf%s (avg)\n [bat-charge] TITLE Battery % Charge YAXIS % DEF:apc1000=bat-charge.rrd:apc1000:AVERAGE LINE2:apc1000#00CCCC:apc1000 COMMENT:\n GPRINT:apc1000:LAST: apc1000 \: %5.1lf%s (cur) GPRINT:apc1000:MAX: \: %5.1lf%s (max) GPRINT:apc1000:MIN: \: %5.1lf%s (min) GPRINT:apc1000:AVERAGE: \: %5.1lf%s (avg)\n [bat-voltage] TITLE Battery VDC YAXIS VDC DEF:apc1000=bat-voltage.rrd:apc1000:AVERAGE LINE1.5:apc1000#FF0000:apc1000 COMMENT:\n GPRINT:apc1000:LAST: apc1000 \: %5.1lf%s (cur) GPRINT:apc1000:MAX: \: %5.1lf%s (max) GPRINT:apc1000:MIN: \: %5.1lf%s (min) GPRINT:apc1000:AVERAGE: \: %5.1lf%s (avg)\n [ups-load] TITLE UPS Load % YAXIS % DEF:apc1000=ups-load.rrd:apc1000:AVERAGE LINE1.5:apc1000#00CC00:apc1000 COMMENT:\n GPRINT:apc1000:LAST: apc1000 \: %5.1lf%s (cur) GPRINT:apc1000:MAX: \: %5.1lf%s (max) GPRINT:apc1000:MIN: \: %5.1lf%s (min) GPRINT:apc1000:AVERAGE: \: %5.1lf%s (avg)\n
- The hobbitgraph.cfg entries explained:
[column-name]: Defines the name of the graph and must match the name of the status column to have this graph appear on the status page of each of the tests TITLE/YAXIS : These define the graph's title and y-axis legend DEF : Defines the alias, the name of the rrd file to read, and the dataset to look for LINE1.5 : This says draw a line of thickness "1.5" GPRINT : The next sets of four lines print out the Last, Maximum, Minimum and Average vales in the rrd file for our tests
- We're done!
- Now, reload the page and you should start seeing something similar to the image below: