Hi,
I am using SLURM nodes to create pools of HTCondor workers and I am running a separate service that watches `condor_q` and executes `sbatch` or `scacncel` on demand.
Hi Seung:
This is a great approach. Informally, we call this technique of
running HTCondor execution point services as jobs under SLURM (or
other batch systems) "glidein", or "glideing-in to slurm", and it
is the basis of the OSG: https://osg-htc.org/
What I am trying to do is pass a runtime constraint for a task to HTCondor so that it can schedule the task to the SLURM node that has enough life left (enough wallclock time left).
For example, if a task needs more than 1hr estimated runtime, I want to let HTCondor schedule the task to any SLURM nodes that have more than 1hr life time.
The first thing you want to do is to have the condor_startd advertise an absolute time of when it thinks it will go away. Adding the following to the startd config file will do so:
AliveUntil = some_utc_time_in_seconds_when_this_ep_will_vanish
STARTD_ATTRS = AliveUntil
Obviously, your startup script will have to calculate the unix time to put into the "AliveUntil" line.
Then, when the starts boots, it will advertise an AliveUntil
custom classad attribute which you can use for matchmaking in your
jobs, e.g. a job submit file could look like:
Requirements = Target.AliveUntil > (time() + 3600)
Let us know how this goes,
-greg
Anyone has done it? Any ideas will be appreciated.
Thank you!
Best regards,Seung
_______________________________________________ HTCondor-users mailing list To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users The archives can be found at: https://lists.cs.wisc.edu/archive/htcondor-users/