Hi, I am running as user root.I did try using full path to gstat. Its is /bin/gstat, as installed by the rpm.
Here is the clipping from GangliadLog: 07/28/21 22:14:28 Starting update...07/28/21 22:14:28 my_popenv: Failed to exec â/bin/gstat, errno=2 (No such file or directory) 07/28/21 22:14:28 Failed to execute â/bin/gstat --mpifile --all --gmond_ip=127.0.0.1 --gmond_port=8649â: No such file or directory
07/28/21 22:14:28 Got 318 daemon ads 07/28/21 22:14:28 Heartbeats sent: 0 07/28/21 22:14:48 Starting update... 07/28/21 22:14:48 Heartbeats sent: 0 07/28/21 22:15:08 Starting update... 07/28/21 22:15:08 Heartbeats sent: 0Apart from that, I see that "my_popenv: Failed to exec" figures in a few error reports related with HTC. For example,
https://www-auth.cs.wisc.edu/lists/htcondor-users/2016-November/msg00143.shtml I do not know if any of those apply to my case. - Nagaraj On 2021-07-28 21:34, John M Knoeller wrote:
I wonder if the path of your interactive shell is unusual. (are you really running commands as the user roo?) try running this command which gstat What does it return? You could try configuring the GANGLIA_GSTAT_COMMAND to have the full path to the gstat command by adding something like this to your condor configuration. GANGLIA_GSTAT_COMMAND=/path/to/gstat --all --mpifile --gmond_ip=localhost --gmond_port=8649 -tj ------------------------- FROM: Nagaraj Panyam <pn@xxxxxxxxxxx> SENT: Wednesday, July 28, 2021 8:11 AM TO: John M Knoeller <johnkn@xxxxxxxxxxx>; HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx> SUBJECT: Re: [HTCondor-users] HTCondor and condor_ganglia issues Hi, I have the following issues that I need help with. About my setup: I have a Ganglia gmetad that handles the regular metrics (cpu, mem, etc) that are sent by gmond's on execute nodes. This part is fine. I now wish to add HTCondor to same gmetad and I need help. This gmetad is on the same host as collector and so on this host I enabled condor_gangliad. (gmetad, collector and condor_gangliad on same host) A) GangliadLog has the following set lines repeating. Clip is pasted below. What is the my_popenv error about ? my_popenv: Failed to exec âgstat, errno=2 (No such file or directory) Failed to execute âgstat --mpifile --all --gmond_ip=127.0.0.1 --gmond_port=8649â: No such file or directory Got 329 daemon ads Heartbeats sent: 0 Starting update... Heartbeats sent: 0 When I run the gstat command, it shows output as below: [roo@ce ~]# gstat --all --mpifile --gmond_ip=127.0.0.1 --gmond_port=8649 wn06.my.domain:128 wn05.my.domain:128 wn04.my.domain:128 wn03.my.domain:128 wn02.my.domain:128 wn01.my.domain:128 wn08.my.domain:64 wn07.my.domain:64 localhost.localdomain:8 B) Is condor_gangliad a routine "data source" for Ganglia's gmetad"? What should be the "data_source" declaration in gmetad.conf? I have gmond that listens on 8649 for the metrics from the execute nodes. The host running collector itself appears as "localhost" (see above). I tried to understand from this tutorial video at https://research.cs.wisc.edu/htcondor/tutorials/videos/2014/Ganglia.html [2] but I could not read the Ganglia screen shown in the video. Thanks Nagaraj On 7/28/21 3:14 AM, John M Knoeller wrote:That sounds like something outside of HTCondor is starting one of those condor_gangliad processes. What is the parent PID of each? perhaps we can track back from there... I don't really know what gstat is, let me ask around and see if any of my colleagues know. -tj ------------------------- FROM: pn <pn@xxxxxxxxxxx> SENT: Tuesday, July 27, 2021 11:54 AM TO: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx> CC: John M Knoeller <johnkn@xxxxxxxxxxx> SUBJECT: Re: [HTCondor-users] HTCondor and condor_ganglia issues More about condor_gangliad process: I stopped condor (systemctl stop). and after that condor_gangliad was still there. I then killed it. And restarted condor after adding GANGLIAD to DAEMON_LIST. Sure enough condor_gangliad was one of the processes. But strangely, less than a second a second condor_gangliad appeared. [root@simclu-ce ~]# ps -ea|grep gangliad 2592326 ? 00:00:00 condor_gangliad 2592334 ? 00:00:00 condor_gangliad Would it be because I have a wrong configuration? Secondly, Gangliadlog has this error: 07/27/21 21:40:23 my_popenv: Failed to exec âgstat, errno=2 (No such file or directory) 07/27/21 21:40:23 Failed to execute âgstat --all --mpifile --gmond_ip=192.168.55.79 --gmond_port=8652â: No such file or directory What file is it complaining about? I replaced "gstat" with "/bin/gstat" and the error shows up again "Failed to exec "/bin/gstat, .." - Nagaraj On 2021-07-27 21:15, John M Knoeller wrote:I'm not sure why the condor_gangliad would be running if you didnotadd it to your daemon list. But the error is because you need toputGANGLIAD in your daemon list not GANGLIA_D. Instructions for how to handle the case where the metad is on a different machine than the condor_collector is here Monitoring â HTCondor Manual 9.1.0 documentation [1] -tj ------------------------- FROM: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> onbehalf ofNagaraj Panyam <pn@xxxxxxxxxxx> SENT: Tuesday, July 27, 2021 6:34 AM TO: htcondor-users@xxxxxxxxxxx <htcondor-users@xxxxxxxxxxx> SUBJECT: [HTCondor-users] HTCondor and condor_ganglia issues Hi, I am trying to configure HTcondor's ganglia monioring. In that context, I see something I do not understand. Firstly, I see the process condor_gangliad even though it is notinthe DAEMON_LIST. config_val_dump shows DAEMON_LIST = MASTERCOLLECTORNEGOTIATOR SCHEDD). Is this expected? Secondly, When I specifically add GANGLIA_D to DAEMON_LIST incondorconfig file, the error given below shows up in MasterLog. Where doIadd the executable path? We have CONDOR_VERSION = 8.9.13GANGLIA_D is in the DAEMON_LIST parameter, but there is no executable path for it defined in the config files! ERROR "Must have the path to GANGLIA_D defined." at line 1606 in file/var/lib/condor/execute/slot1/dir_19111/userdir/.tmp9djsO9/BUILD/condor-8.9.13/src/condor_master.V6/masterDaemon.cppThirdly, after resolving above issues, what is the scheme tohookupHTCondor's monitoring to existing Ganglia? We will have condor_gangliad on same machine as Collector, and Ganglia's metad running on a different host. Thanks Nagaraj Links: ------ [1]https://htcondor.readthedocs.io/en/latest/admin-manual/monitoring.html?highlight=gangliad#ganglia[1]_______________________________________________ HTCondor-users mailing list To unsubscribe, send a message tohtcondor-users-request@xxxxxxxxxxxwith a subject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users The archives can be found at: https://lists.cs.wisc.edu/archive/htcondor-users/Links: ------ [1] https://htcondor.readthedocs.io/en/latest/admin-manual/monitoring.html?highlight=gangliad#ganglia[2] https://research.cs.wisc.edu/htcondor/tutorials/videos/2014/Ganglia.html