Cody,
When I was at Purdue, I tried monitoring HTCondor servers (i.e. not
execute nodes) with Nagios. I eventually removed the checks because
they didn't add value. The condor_master does a good job of making
sure the daemons are running. I did get alerts for the schedd checks,
but they turned out to be false alarms when the schedd was just too
busy to answer the condor_q from Nagios. (I suppose that's an issue in
itself, but it wasn't what we were checking for).
I guess the point of this story is to ask what exactly you want to
check and why. Knowing that makes it easier to offer guidance.
Thanks,
BC