[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Dynamic slots and limits by job profiles



Hi Charles,

In our local pool, we give each machine one "IoHeavy" resource and ask users whose jobs are disk IO heavy to use "request_ioheavy = 1" in their jobs. Here's a post from the archive with more details:

https://www-auth.cs.wisc.edu/lists/htcondor-users/2022-September/msg00013.shtml

Maybe you could do something similar, where you give your machines a number of "simulation resources" and "rendering resources" depending on the machine's total resources, and then have your users request these resources for these types of jobs? E.g. to allow 10% of CPUs to be available at a time for simulation jobs and 20% of CPUs to be available at a time for rendering jobs:

MACHINE_RESOURCE_Simulation = INT(0.1 * $(DETECTED_CPUS))
MACHINE_RESOURCE_Rendering = INT(0.2 * $(DETECTED_CPUS))

Then jobs could specify either request_simulation = 1 or request_rendering = 1 (or you could even have submit transforms add RequestSimulation = 1 or RequestRendering = 1 to the jobs if you think you can classify these jobs in your access point config).

Jason

On Thu, Mar 23, 2023 at 5:27âAM Charles Goyard <cgoyard@xxxxxxx> wrote:
Hi all,

again I'm requesting some help !


On our HTCondor pool (v10.0), we have dynamic slots enabled.

Smaller machines can accept 1 job, medium ones accept 2 jobs, and the
higher-end computers accept up to 4 jobs.

Say our users run 3 types (or profiles) of jobs:

- Compositing
- Rendering
- Simulation

The jobs declare a property with the job type (in an env var for example).

Running multiple Compositing tasks at once does not cause a problem, but
having 4 Rendering or Simulation is not really great.

What I would like to achieve is to define an _expression_ on the execute
nodes that says :

- accept at most one Simulation job at once.
- accept at most two Rendering jobs at once.
- accept any number of Compositing jobs.

So for example a single execute node could run :

sim.  render comp. Âtotal
-----------------------------
1Â Â Â Â2Â Â Â Â1Â Â Â Â4
1Â Â Â Â1Â Â Â Â2Â Â Â Â4
1Â Â Â Â0Â Â Â Â3Â Â Â Â4
0Â Â Â Â2Â Â Â Â2Â Â Â Â4
0Â Â Â Â1Â Â Â Â3Â Â Â Â4
0Â Â Â Â0Â Â Â Â4Â Â Â Â4


Since there are only 6 cases, I'm ok this building a super-long
_expression_ :).

I found a recipe that looks like this kind of thing, but for static
slots here : https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToReserveSlotForSpecialJobs .

But I can't get my head around on how to achieve this setup with dynamic
slots. How can I get information on the type of job running at a given
time on a execute node ? (this sounds a bit like my question about how
to count the number of dynamic slots discussed in December).

Thanks,

--
Charles
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/