So, on your 1 node pool, you get 8 daemon ads but none are published to ganglia. So, either the ads or machine names do not match. You increased the GANGLIAD_VERBOSITY to 10. So, unless /home/condor/condor-8.3.4-x86_64_Ubuntu14-unstripped/etc/condor/ganglia.d/00_default_metrics is empty, you are not matching machine names.
First, make sure that the 00_default_metrics is not empty.
Then, I recommend setting
ÂÂÂ GANGLIA_SEND_DATA_FOR_ALL_HOSTS = true
This setting is used to inject metrics for hosts not being monitored by ganglia (typically windows hosts or other hosts without a local gmond). You may see the HTCondor metrics appear under a different hostname. Whenever the HTCondor gangliad propagates metrics to hosts not monitored by ganglia, it needs to send the heartbeats for those hosts. Sending 0 heartbeats is not in of itself indicative of a problem.
Hopefully, that will get you a little further along. Let us know what you find.
...Tim
On 03/09/2015 12:59 PM, Ricardo Oda wrote:
Hello,
I trying to learn how to setup ganglia to monitor a condor pool.
I'm currently working on localhost to make things easier. I configured ganglia and it's working to monitor this 1 node cluster. The default metrics of gmond.conf are working fine and appear on the web frontend, but I'm having trouble to get the condor metrics.
In the GangliaLog I have:
03/09/15 14:20:28 ******************************************************03/09/15 14:20:28 ** condor_gangliad (CONDOR_GANGLIAD) STARTING UP03/09/15 14:20:28 ** /home/condor/condor-8.3.4-x86_64_Ubuntu14-unstripped/libexec/condor_gangliad03/09/15 14:20:28 ** SubsystemInfo: name=GANGLIAD type=DAEMON(12) class=DAEMON(1)03/09/15 14:20:28 ** Configuration: subsystem:GANGLIAD local:<NONE> class:DAEMON03/09/15 14:20:28 ** $CondorVersion: 8.3.4 Mar 02 2015 BuildID: 304666 $03/09/15 14:20:28 ** $CondorPlatform: x86_64_Ubuntu14 $03/09/15 14:20:28 ** PID = 892203/09/15 14:20:28 ** Log last touched 3/9 14:20:1103/09/15 14:20:28 ******************************************************03/09/15 14:20:28 Using config source: /home/condor/condor-8.3.4-x86_64_Ubuntu14-unstripped/etc/condor_config03/09/15 14:20:28 Using local config sources:03/09/15 14:20:28 Â Â/home/condor/condor-8.3.4-x86_64_Ubuntu14-unstripped/local.xxxx/condor_config.local03/09/15 14:20:28 config Macros = 58, Sorted = 58, StringBytes = 1697, TablesBytes = 213603/09/15 14:20:28 CLASSAD_CACHING is ENABLED03/09/15 14:20:28 Daemon Log is logging: D_ALWAYS D_ERROR03/09/15 14:20:28 Daemoncore: Listening at <0.0.0.0:45401> on TCP (ReliSock) and UDP (SafeSock).03/09/15 14:20:28 DaemonCore: command socket at <xxx.xxx.xx.xx:45401>03/09/15 14:20:28 DaemonCore: private command socket at <xxx.xxx.xx.xx:45401>03/09/15 14:20:28 Testing /usr/bin/gmetric03/09/15 14:20:28 Loading libganglia libganglia.so03/09/15 14:20:28 Will use libganglia to interact with ganglia.03/09/15 14:20:28 Will perform stats publication every GANGLIAD_INTERVAL=60 seconds.03/09/15 14:20:28 Reading metric definitions from /home/condor/condor-8.3.4-x86_64_Ubuntu14-unstripped/etc/condor/ganglia.d/00_default_metrics03/09/15 14:20:48 Starting update...03/09/15 14:20:48 Ganglia is monitoring 1 hosts03/09/15 14:20:48 Got 8 daemon ads03/09/15 14:20:48 Heartbeats sent: 003/09/15 14:21:08 Starting update...03/09/15 14:21:08 Heartbeats sent: 0
Here are my configs of ganglia:
$condor_config_val -dump |grep -i gangliaDAEMON_LIST = COLLECTOR MASTER NEGOTIATOR SCHEDD STARTD GANGLIAD
GANGLIA_CONFIG = /etc/ganglia/gmond.confGANGLIA_GMETRIC = /usr/bin/gmetricGANGLIA_GSTAT_COMMAND = gstat --all --mpifile --gmond_ip=localhost --gmond_port=8649GANGLIA_LIB = libganglia.soGANGLIA_LIB64_PATH = /lib64,/usr/lib64,/usr/local/lib64GANGLIA_LIB_PATH = /lib,/usr/lib,/usr/local/libGANGLIA_SEND_DATA_FOR_ALL_HOSTS = falseGANGLIAD = $(LIBEXEC)/condor_gangliadGANGLIAD_INTERVAL = 60GANGLIAD_LOG = $(LOG)/GangliadLogGANGLIAD_METRICS_CONFIG_DIR = /home/condor/condor-8.3.4-x86_64_Ubuntu14-unstripped/etc/condor/ganglia.dGANGLIAD_PER_EXECUTE_NODE_METRICS = trueGANGLIAD_REQUIREMENTS =ÂGANGLIAD_VERBOSITY = 10MAX_GANGLIAD_LOG = $(MAX_DEFAULT_LOG)
GANGLIAD daemon is running but I think it's not transmitting its monitoring data to ganglia.Do I have to do something to include the condor default metrics into ganglia?
Well I'm not sure why but I keep getting "Heartbeats sent: 0". I would appreciate some help.
Thanks in advance,Ricardo Oda
_______________________________________________ HTCondor-users mailing list To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users The archives can be found at: https://lists.cs.wisc.edu/archive/htcondor-users/
-- Tim Theisen Release Manager HTCondor & Open Science Grid Center for High Throughput Computing Department of Computer Sciences University of Wisconsin - Madison 4261 Computer Sciences and Statistics 1210 W Dayton St Madison, WI 53706-1685 +1 608 265 5736
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/