It was added ton condor_submit in 23.5, and it exhibited this exact partial failure mode until 23.8, when the STARTD d-slot creation code was changed to handle require_gpus expressions that reference job attributes.
-tj
From: Anderson, Stuart B. <sba@xxxxxxxxxxx>
Sent: Friday, November 8, 2024 2:51 PM To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx> Cc: John M Knoeller <johnkn@xxxxxxxxxxx> Subject: Re: [HTCondor-users] Strange behavior with GPU match making in 24.0.1 > On Nov 8, 2024, at 11:46âAM, John M Knoeller via HTCondor-users <htcondor-users@xxxxxxxxxxx> wrote: > > A STARTD that is older than 24.0 does not handle the new first-class > gpus submit commands like gpus_minimum_memory correctly, so jobs will match to slots that exist, but it will fail to create a new dynamic slot when the job is using one of those commands. > > The fix is to upgrade your execute nodes. I thought this was added back in 23.5.x? Thanks. â Stuart Anderson sba@xxxxxxxxxxx |