If you want to be able to run a command and force the STARTD to re-check the URL, you can setup your STARTD_CRON hook as a OneShot hook with reconfig_rerun enabled. STARTD_CRON hooks of this type will run when the daemon starts up, and also on reconfig. If you are willing to actually run a daemon that does the checking, you could also consider running it as a ‘continuous’ cron hook. This is a type of STARTD_CRON hook that runs all of the time, and writes a new ClassAd to stdout when it wants to update the STARTD.
To do that you set the STARTD_CRON_*_MODE to WaitForExit, and have your daemon write to stdout only when it wants the STARTD to change state. When your job writes “- update:true”
to stdout, the STARTD will act on the output even if your cron hook does not exit.
See http://research.cs.wisc.edu/htcondor/manual/v8.7/4_4Hooks.html#52841 for the syntax. -tj From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx]
On Behalf Of Sveinung Rundhovde Hi,
I am setting up a system with a HTCondor pool running on OpenStack. I am trying to create a mechanism that enables draining workers by setting a value in their metadata to true (data made
available to the VM via a URL). It should also be possible to make them start accepting jobs again by resetting this value.
So far I have come up with a couple of solutions that work, but not as well as I would like.
The first is to use job hooks. By setting PREPARE_JOB and JOB_EXIT job hooks for the starter with a script that sets the START parameter to false if the metadata value is set to true.
This script spawns a daemon that checks the metadata regularly and sets START back to true if the metadata is set back to false.
There are however a few issues with this solution. The PREPARE_JOB hook is executed after the job is already on the execute node, so even if START is set to false at this time the job
will still run. I was able to solve this by making the script return a none zero exit value, thereby causing the job to be aborted (jobs are aborted if the PREPARE_JOB hook has none zero exit value). This is okey and works, but it is a bit "hacky", and jobs
will be sent to machines and then aborted.
Another solution I tried was using cron jobs. I set it up to run periodic, and update the START value. The issue here is that there will be a delay before the draining is started. Of course
the period can be set low, however this will put load on the system.
Is there a better way to do this? |