[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Question regarding machine specific defaults for non-standard resources



Hi Tim,

A couple of possible solutions are to configure either a Submit Transform or a Submit Requirement for the AP's users are submitting jobs. The transform route is tricky,but could check for jobs wanting GPUs that didn't specify a requested amount of GPUMemory then set the requested memory to the target slots TotalSlotGPUMemory and modify the job add to the job requirements to check for the slot having GPUMemoryMB.

The easier and probably better solution is to create a submit requirement that checks if a job is requesting a gpu and has set a request_GPUMemoryMB value. This will then fail the job submission and hopefully train the users to specify the amount of desired GPUMemory even if that is all of the memory.

-Cole Bollig

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Voigtländer, Tim Aike (ETP) <tim.voigtlaender@xxxxxxx>
Sent: Thursday, May 2, 2024 8:34 AM
To: htcondor-users@xxxxxxxxxxx <htcondor-users@xxxxxxxxxxx>
Subject: [HTCondor-users] Question regarding machine specific defaults for non-standard resources
 

Hi all,

I'm currently in the process of setting up a GPU node to accept multiple jobs per GPU (by using a shared device memory pool).
This works well so far, but I was wondering if there is an intended way to set the default value of a custom machine resource (`GPUMemoryMB` in my case).
It seems to default to 0 if the `request_GPUMemoryMB` is not set in the submission file, but I need it to instead default to the max value (32000 in this case).
As we have multiple nodes with different types of GPUs, the default also needs to be able to be set for each machine individually.
There seem to be some resources with such defaults like `JOB_DEFAULT_REQUESTDISK`, but I think those are exceptions.
I've attached the relevant configs I'm currently using and would be happy for advice on this topic.

Cheers and thanks,
Tim Voigtländer