Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Task scheduling with wall-clock-time of SLURM nodes

Date: Fri, 8 Sep 2023 12:45:51 -0500
From: Greg Thain <gthain@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] Task scheduling with wall-clock-time of SLURM nodes

On 9/8/23 12:20, Seung-Jin Sul wrote:

Hi,

I am using SLURM nodes to create pools of HTCondor workers and I am running a separate service that watches `condor_q` and executes `sbatch` or `scacncel` on demand.

Hi Seung:

This is a great approach. Informally, we call this technique of running HTCondor execution point services as jobs under SLURM (or other batch systems) "glidein", or "glideing-in to slurm", and it is the basis of the OSG: https://osg-htc.org/

What I am trying to do is pass a runtime constraint for a task to HTCondor so that it can schedule the task to the SLURM node that has enough life left (enough wallclock time left).

For example, if a task needs more than 1hr estimated runtime, I want to let HTCondor schedule the task to any SLURM nodes that have more than 1hr life time.

The first thing you want to do is to have the condor_startd advertise an absolute time of when it thinks it will go away. Adding the following to the startd config file will do so:

AliveUntil = some_utc_time_in_seconds_when_this_ep_will_vanish

STARTD_ATTRS = AliveUntil

Obviously, your startup script will have to calculate the unix time to put into the "AliveUntil" line.

Then, when the starts boots, it will advertise an AliveUntil custom classad attribute which you can use for matchmaking in your jobs, e.g. a job submit file could look like:

Requirements = Target.AliveUntil > (time() + 3600)

Let us know how this goes,

-greg

Anyone has done it? Any ideas will be appreciated.

Thank you!

Best regards,

Seung

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

References:
- [HTCondor-users] Task scheduling with wall-clock-time of SLURM nodes
  - From: Seung-Jin Sul

Prev by Date: [HTCondor-users] Task scheduling with wall-clock-time of SLURM nodes
Next by Date: Re: [HTCondor-users] OAuthCredmon and Schedd: markfile names consistency
Previous by thread: [HTCondor-users] Task scheduling with wall-clock-time of SLURM nodes
Next by thread: Re: [HTCondor-users] [External] Task scheduling with wall-clock-time of SLURM nodes
Index(es):
- Date
- Thread

Mailing List Archives

Authenticated access

Re: [HTCondor-users] Task scheduling with wall-clock-time of SLURM nodes