Hi Everyone, My apologies if this has been asked already, or if I missed a notification. I have searched and not found any references to this question. There appears to be a new delay in the updates on availability of a core in condor_status for the pool, when the htcondor on a machine is stopped? I am pretty sure that delay was not there before? Example: If I have 80 cores (16 cores split over 5 VMs that are only running startdâs) and they are all up, then condor_status correctly shows 80 cores. If I then shutdown HTCondor on one of the VMs â
a ps shows that the condor processes are gone, but condor_status does not update and reflect that the number of cores is down to 64 for many minutes (as many as 10 or 15 minutes). I believe that this is new behavior in 8.6 (we are currently running 8.6.6). I double checked in our 8.4 pool before we updated it and I am pretty sure that it did not have that behavior, meaning a shutdown
of HTCondor on a VM in a pool was immediately reflected in condor_status. Is this behavior expected? Is there a better way (other than the ps) to determine what cores are really there with a reliable immediate answer? We have been troubleshooting some issues which have required
a number of shutdowns and startups and it has become an issue (really just a pain in theâ. - there are other ways to tell) that the condor_status result is not a true current reflection of the status of the pool. Did I miss a new knob or a new command?
:) Thank You -- Mary Mary Romelfanger Deputy Branch Manager Data Systems Branch .___. {o,o} Phone 410-338-6708 Space Telescope Science Institute 3700 San Martin Drive Baltimore, MD 21218 |