[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Best Practice for Reporting Execute Node Status and Configuration Issues



Hi Gabriel,

you could set up a condor cron and run every few minutes a health check script. The health check could try to lookup/timeout your mounts and would return a string with class ad to evaluate for the START expression - something along the lines of [1] where a class ad "NODE_IS_HEALTHY" is evaluated for the start expression and which is set by a `nodehealth.sh` script that is started every three minutes for the startd.

See also
https://htcondor.readthedocs.io/en/23.0/admin-manual/daemon-cron.html

Cheers,
  Thomas



[1]

STARTD_CRON_NODEHEALTH_EXECUTABLE = /etc/condor/tests/nodehealth.sh
## with the script returning "NODE_IS_HEALTHY=true/false"

STARTD_CRON_NODEHEALTH_PERIOD = 180s
STARTD_CRON_NODEHEALTH_MODE = Periodic
STARTD_CRON_JOBLIST = NODEHEALTH


START = (NODE_IS_HEALTHY =?= true) && ...

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature