Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] All jobs stay idle in HTCondor-CE because of GPU requirement - but host of jobs have no GPUs
- Date: Fri, 9 Aug 2024 21:13:21 +0000
- From: Jaime Frey <jfrey@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] All jobs stay idle in HTCondor-CE because of GPU requirement - but host of jobs have no GPUs
I donât know why Undefined is creeping into the evaluation. The set of expressions being evaluated is a little complex and I must be missing something.
I think the biggest conclusion here is that maxMemory is expected to be a literal value by the CE. Also, maxMemory can be omitted and the CEâs configuration will select a sane default (which you should set via default_maxMemory in your specific CEâs configuration, given your small EPs).
- Jaime
> On Aug 9, 2024, at 2:46âPM, Marco Mambelli <marcom@xxxxxxxx> wrote:
>
> Jaime,
> I did some more digging and there may be a bug in the eval_set_OriginalMemory.
> It seems that the culprit is the evaluation of OriginalMemory, which is used in the RequestMemory rewritten by HTCondor-CE
> RequestMemory = ifThenElse(WantWholeNode =?= true, !isUndefined(TotalMemory) ? TotalMemory * 95 / 100 : JobMemory,OriginalMemory)
>
> I did not change any configuration in the CE/Job router and I see in /usr/share/condor-ce/condor_ce_router_defaults:
> /* Note default memory request of 2GB */
> /* Note yet another nested condition allow pass attributes (maxMemory,xcount,jobtype,queue)
> via gWMS Factory described within ClassAd */
> eval_set_OriginalMemory = ifThenElse(maxMemory isnt undefined,
> maxMemory,
> ifThenElse(default_maxMemory isnt undefined,
> default_maxMemory,
> 2000));
>
> Somehow when maxMemory is evaluated, RequestMemory it is considered undefined ?!?
>
> When "+maxMemory = RequestMemoryâ (and RequestMemory evaluates to 1 on the CE), OriginalMemory evaluates to 2000
> and all jobs stay idle forever on my small EP
>
> When "+maxMemory = (RequestMemory ?: 1)â (and RequestMemory still evaluates to 1 on the CE), OriginalMemory evaluates to 1 and jobs can run on my small EP
>
> Thanks,
> Marco
>
>
>
> [root@ce-workspace /]# condor_q -all -l | grep -i mem
> AutoClusterAttrs = "MachineLastMatchTime,Offline,RemoteOwner,RequestCpus,RequestDisk,RequestGPUs,RequestMemory,TotalJobRuntime,ConcurrencyLimits,FlockTo,Rank,Requirements,DiskUsage,GlideinCpusIsGood,JobCpus,JobGPUs,JobIsRunning,JobMemory,JobStatus,MATCH_EXP_JOB_GLIDEIN_Cpus,MATCH_EXP_JOB_GLIDEIN_GPUs,MATCH_EXP_JOB_GLIDEIN_Memory,OriginalCpus,OriginalGPUs,OriginalMemory,TotalCpus,TotalGPUs,TotalMemory,WantWholeNode"
> JOB_GLIDEIN_Memory = "$$(TotalMemory:0)"
> JobMemory = JobIsRunning ? int(MATCH_EXP_JOB_GLIDEIN_Memory) * 95 / 100 : OriginalMemory
> MATCH_EXP_JOB_GLIDEIN_Memory = "1763"
> MATCH_TotalMemory = 1763
> maxMemory = (RequestMemory ?: 1)
> MemoryProvisioned = 128
> OriginalMemory = 1
> remote_OriginalMemory = 1
> RequestMemory = ifThenElse(WantWholeNode =?= true, !isUndefined(TotalMemory) ? TotalMemory * 95 / 100 : JobMemory,OriginalMemory)
> [root@ce-workspace /]# condor_ce_q -all -l | grep -i mem
> maxMemory = (RequestMemory ?: 1)
> MemoryProvisioned = 128
> RequestMemory = ifthenelse(MemoryUsage =!= undefined,MemoryUsage,(ImageSize + 1023) / 1024)
> Requirements = (TARGET.Arch == "X86_64") && (TARGET.OpSys == "LINUX") && (TARGET.Disk >= RequestDisk) && (TARGET.Memory >= RequestMemory)
> [root@ce-workspace /]# condor_ce_q -all -l | grep -i image
> ImageSize = 32500
> ImageSize_RAW = 30240
> RequestMemory = ifthenelse(MemoryUsage =!= undefined,MemoryUsage,(ImageSize + 1023) / 1024)
> [root@ce-workspace /]# condor_q -all -af maxMemory RequestMemory WantWholeNode JobMemory OriginalMemory ImageSize
> 1 1 undefined 1 1 32500
> [root@ce-workspace /]# condor_ce_q -all -af maxMemory RequestMemory WantWholeNode JobMemory OriginalMemory ImageSize
> 32 32 undefined undefined undefined 32500
>
>
>
>> On Aug 9, 2024, at 12:38âPM, Jaime Frey <jfrey@xxxxxxxxxxx> wrote:
>>
>> [EXTERNAL] â This message is from an external sender
>>
>> Normally, thereâs no value for RequestMemory thatâs too small. As the expression shows, the default for RequestMemory is the size of the executable, until the job runs for the first time, at which point it becomes the max amount of memory used on previous executions. Partitionable EPs will round up the memory allocated to the slot to some reasonable minimum amount.
>>
>> You have something that works, so I donât think we need to dig further.
>>
>> - Jaime
>>
>>> On Aug 9, 2024, at 10:27âAM, Marco Mambelli <marcom@xxxxxxxx> wrote:
>>>
>>> I think I found the problem
>>> maxMemory is 1 and somehow it is too small for the CE to be considered:
>>>
>>> maxMemory = ((RequestMemory ?: 1) > 100 ? RequestMemory : 100)
>>> MemoryProvisioned = 128
>>> RequestMemory = ifthenelse(MemoryUsage =!= undefined,MemoryUsage,(ImageSize + 1023) / 1024)
>>> Requirements = (TARGET.Arch == "X86_64") && (TARGET.OpSys == "LINUX") && (TARGET.Disk >= RequestDisk) && (TARGET.Memory >= RequestMemory)
>>>
>>> [root@ce-workspace /]# condor_ce_q -af maxMemory RequestMemory ImageSize MemoryUsage
>>> 1 1 100 undefined
>>> 1 1 100 undefined
>>> 1 1 100 undefined
>>>
>>> Replacing maxMemory with:
>>> maxMemory = ((RequestMemory ?: 1) > 100 ? RequestMemory : 100)
>>>
>>> Gets maxMemory evaluated to 100 and allows jobs to run
>>>
>>> Marco
>>>
>>>
>>>
>>>> On Aug 9, 2024, at 10:11âAM, Jaime Frey <jfrey@xxxxxxxxxxx> wrote:
>>>>
>>>> [EXTERNAL] â This message is from an external sender
>>>>
>>>> You are looking at the unrouted jobs in the CE queue. You need to look at the routed jobs in the site queue (i.e. run condor_q).
>>>>
>>>> - Jaime
>>>>
>>>>> On Aug 8, 2024, at 11:49âPM, Marco Mambelli <marcom@xxxxxxxx> wrote:
>>>>>
>>>>> Initially I has a problem because I was omitting the plus (maxMemory instead of +maxMemory)
>>>>> But also after setting +maxMemory=RequestMemory jobs are still not matching
>>>>>
>>>>> [root@ce-workspace /]# condor_ce_q
>>>>>
>>>>>
>>>>> -- Schedd: ce-workspace.glideinwms.org : <10.89.0.2:41921?... @ 08/09/24 04:22:29
>>>>> OWNER BATCH_NAME SUBMITTED DONE RUN IDLE TOTAL JOB_IDS
>>>>> fermilab ID: 1 8/9 03:55 _ _ 1 1 1.0
>>>>> fermilab ID: 2 8/9 03:57 _ _ 1 1 2.0
>>>>> fermilab ID: 3 8/9 04:05 _ _ 1 1 3.0
>>>>>
>>>>> Total for query: 3 jobs; 0 completed, 0 removed, 3 idle, 0 running, 0 held, 0 suspended
>>>>> Total for all users: 3 jobs; 0 completed, 0 removed, 3 idle, 0 running, 0 held, 0 suspended
>>>>>
>>>>> [root@ce-workspace /]# condor_ce_q -l | grep -i mem
>>>>> maxMemory = RequestMemory
>>>>> RequestMemory = ifthenelse(MemoryUsage =!= undefined,MemoryUsage,(ImageSize + 1023) / 1024)
>>>>> Requirements = (TARGET.Arch == "X86_64") && (TARGET.OpSys == "LINUX") && (TARGET.Disk >= RequestDisk) && (TARGET.Memory >= RequestMemory)
>>>>> maxMemory = RequestMemory
>>>>> RequestMemory = ifthenelse(MemoryUsage =!= undefined,MemoryUsage,(ImageSize + 1023) / 1024)
>>>>> Requirements = (TARGET.Arch == "X86_64") && (TARGET.OpSys == "LINUX") && (TARGET.Disk >= RequestDisk) && (TARGET.Memory >= RequestMemory)
>>>>> maxMemory = RequestMemory
>>>>> RequestMemory = ifthenelse(MemoryUsage =!= undefined,MemoryUsage,(ImageSize + 1023) / 1024)
>>>>> Requirements = (TARGET.Arch == "X86_64") && (TARGET.OpSys == "LINUX") && (TARGET.Disk >= RequestDisk) && (TARGET.Memory >= RequestMemory)
>>>>> [root@ce-workspace /]# condor_ce_q -l | grep -i size
>>>>> ExecutableSize = 100
>>>>> ExecutableSize_RAW = 87
>>>>> ImageSize = 100
>>>>> ImageSize_RAW = 87
>>>>> RequestMemory = ifthenelse(MemoryUsage =!= undefined,MemoryUsage,(ImageSize + 1023) / 1024)
>>>>> TransferInputSizeMB = 0
>>>>> ExecutableSize = 100
>>>>> ExecutableSize_RAW = 87
>>>>> ImageSize = 100
>>>>> ImageSize_RAW = 87
>>>>> RequestMemory = ifthenelse(MemoryUsage =!= undefined,MemoryUsage,(ImageSize + 1023) / 1024)
>>>>> TransferInputSizeMB = 0
>>>>> ExecutableSize = 100
>>>>> ExecutableSize_RAW = 87
>>>>> ImageSize = 100
>>>>> ImageSize_RAW = 87
>>>>> RequestMemory = ifthenelse(MemoryUsage =!= undefined,MemoryUsage,(ImageSize + 1023) / 1024)
>>>>> TransferInputSizeMB = 0
>>>>> [root@ce-workspace /]# condor_ce_q
>>>>>
>>>>>
>>>>> -- Schedd: ce-workspace.glideinwms.org : <10.89.0.2:41921?... @ 08/09/24 04:23:53
>>>>> OWNER BATCH_NAME SUBMITTED DONE RUN IDLE TOTAL JOB_IDS
>>>>> fermilab ID: 1 8/9 03:55 _ _ 1 1 1.0
>>>>> fermilab ID: 2 8/9 03:57 _ _ 1 1 2.0
>>>>> fermilab ID: 3 8/9 04:05 _ _ 1 1 3.0
>>>>>
>>>>> Total for query: 3 jobs; 0 completed, 0 removed, 3 idle, 0 running, 0 held, 0 suspended
>>>>> Total for all users: 3 jobs; 0 completed, 0 removed, 3 idle, 0 running, 0 held, 0 suspended
>>>>>
>>>>> [root@ce-workspace /]# condor_ce_q -better 1.0
>>>>>
>>>>>
>>>>> -- Schedd: ce-workspace.glideinwms.org : <10.89.0.2:41921?...
>>>>> The Requirements expression for job 1.000 is
>>>>>
>>>>> (TARGET.Arch == "X86_64") && (TARGET.OpSys == "LINUX") && (TARGET.Disk >= RequestDisk) && (TARGET.Memory >= RequestMemory)
>>>>>
>>>>> Job 1.000 defines the following attributes:
>>>>>
>>>>> DiskUsage = 100
>>>>> ImageSize = 100
>>>>> RequestDisk = DiskUsage (kb)
>>>>> RequestMemory = ifthenelse(MemoryUsage =!= undefined,MemoryUsage,(ImageSize + 1023) / 1024) (mb)
>>>>>
>>>>> The Requirements expression for job 1.000 reduces to these conditions:
>>>>>
>>>>> Slots
>>>>> Step Matched Condition
>>>>> ----- -------- ---------
>>>>> [0] 0 TARGET.Arch == "X86_64"
>>>>> [1] 0 TARGET.OpSys == "LINUX"
>>>>> [3] 0 TARGET.Disk >= RequestDisk
>>>>> [5] 0 TARGET.Memory >= RequestMemory
>>>>>
>>>>>
>>>>> [root@ce-workspace /]# condor_status
>>>>> Name OpSys Arch State Activity LoadAv Mem ActvtyTime
>>>>>
>>>>> slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 1763 0+00:29:22
>>>>>
>>>>> Total Owner Claimed Unclaimed Matched Preempting Drain Backfill BkIdle
>>>>>
>>>>> X86_64/LINUX 1 0 0 1 0 0 0 0 0
>>>>>
>>>>> Total 1 0 0 1 0 0 0 0 0
>>>>> [root@ce-workspace /]# condor_status -l grep Memory
>>>>> condor_status: unknown host grep
>>>>> [root@ce-workspace /]# condor_status -l | grep Memory
>>>>> ChildMemory = { }
>>>>> DetectedMemory = 1763
>>>>> MachineResources = "Cpus Memory Disk Swap GPUs"
>>>>> Memory = 1763
>>>>> TotalMemory = 1763
>>>>> TotalSlotMemory = 1763
>>>>> TotalVirtualMemory = 6690348
>>>>> VirtualMemory = 0
>>>>> WithinResourceLimits = (MY.Cpus > 0 && TARGET.RequestCpus <= MY.Cpus && MY.Memory > 0 && TARGET.RequestMemory <= MY.Memory && MY.Disk > 0 && TARGET.RequestDisk <= MY.Disk && (TARGET.RequestGPUs =?= undefined || MY.GPUs >= TARGET.RequestGPUs))
>>>>> [root@ce-workspace /]# condor_status -l | grep Arch
>>>>> Arch = "X86_64"
>>>>> [root@ce-workspace /]# condor_status -l | grep OpSys
>>>>> OpSys = "LINUX"
>>>>> OpSysAndVer = "AlmaLinux9"
>>>>> OpSysLegacy = "LINUX"
>>>>> OpSysLongName = "AlmaLinux release 9.4 (Seafoam Ocelot)"
>>>>> OpSysMajorVer = 9
>>>>> OpSysName = "AlmaLinux"
>>>>> OpSysShortName = "AlmaLinux"
>>>>> OpSysVer = 904
>>>>> [root@ce-workspace /]# condor_status -l | grep Disk
>>>>> ChildDisk = { }
>>>>> Disk = 19881824
>>>>> MachineResources = "Cpus Memory Disk Swap GPUs"
>>>>> TotalDisk = 19881824
>>>>> TotalSlotDisk = 19881824.0
>>>>> WithinResourceLimits = (MY.Cpus > 0 && TARGET.RequestCpus <= MY.Cpus && MY.Memory > 0 && TARGET.RequestMemory <= MY.Memory && MY.Disk > 0 && TARGET.RequestDisk <= MY.Disk && (TARGET.RequestGPUs =?= undefined || MY.GPUs >= TARGET.RequestGPUs))
>>>>> [root@ce-workspace /]# condor_ce_q -l | grep -i disk
>>>>> DiskUsage = 100
>>>>> DiskUsage_RAW = 89
>>>>> RequestDisk = DiskUsage
>>>>> Requirements = (TARGET.Arch == "X86_64") && (TARGET.OpSys == "LINUX") && (TARGET.Disk >= RequestDisk) && (TARGET.Memory >= RequestMemory)
>>>>> DiskUsage = 100
>>>>> DiskUsage_RAW = 89
>>>>> RequestDisk = DiskUsage
>>>>> Requirements = (TARGET.Arch == "X86_64") && (TARGET.OpSys == "LINUX") && (TARGET.Disk >= RequestDisk) && (TARGET.Memory >= RequestMemory)
>>>>> DiskUsage = 100
>>>>> DiskUsage_RAW = 89
>>>>> RequestDisk = DiskUsage
>>>>> Requirements = (TARGET.Arch == "X86_64") && (TARGET.OpSys == "LINUX") && (TARGET.Disk >= RequestDisk) && (TARGET.Memory >= RequestMemory)
>>>>>
>>>>>> On Aug 8, 2024, at 1:37âPM, Jaime Frey <jfrey@xxxxxxxxxxx> wrote:
>>>>>>
>>>>>> [EXTERNAL] â This message is from an external sender
>>>>>>
>>>>>> The CE will look at maxMemory in the incoming job ad and default_maxMemory of the route in order to set RequestMemory of the routed job. If neither is set, then the value 2000 is used. Does that match your expectation? Note that the RequestMemory attribute of the incoming job is ignored (though if maxMemory references it, that is respected).
>>>>>>
>>>>>> If you send me the incoming and routed job ads and the route configuration, I can take a look to see whatâs happening.
>>>>>>
>>>>>> - Jaime
>>>>>>
>>>>>>> On Aug 7, 2024, at 7:14âPM, Marco Mambelli via HTCondor-users <htcondor-users@xxxxxxxxxxx> wrote:
>>>>>>>
>>>>>>> I think the GPU is a red herring,
>>>>>>> The problem is the HTCondor-CE bug/feature where also a sleep job asking basically no memory is bumped by the CE to request 2 GB of memory.
>>>>>>> I was setting maxMemory in the glidein ( <submit_attr name="maxMemory" value="RequestMemoryâ/>) but something is going wrong
>>>>>>>
>>>>>>>> On Aug 7, 2024, at 6:05âPM, Marco Mambelli via HTCondor-users <htcondor-users@xxxxxxxxxxx> wrote:
>>>>>>>>
>>>>>>>> [EXTERNAL] â This message is from an external sender
>>>>>>>>
>>>>>>>> Greetings,
>>>>>>>> HTCondor-CE 23.0.8 (htcondor 23.8.1) seems to add a GPU requirement that causes all jobs to stay idle.
>>>>>>>> There is no GPU on the host and no GPU is mentioned in the job classed on the submit host.
>>>>>>>> On the HTCondor-CE there seems to be a GPU requirement that is not matched and causes the jobs to stay idle.
>>>>>>>>
>>>>>>>> Follow below the outputs of:
>>>>>>>> - condor_q -better -reverse
>>>>>>>> - condor_q -better
>>>>>>>> - condor_q -l | grep -i gpu
>>>>>>>>
>>>>>>>> All jobs submitted stay idle
>>>>>>>> Any suggestion on how to troubleshoot this?
>>>>>>>> Any recent change involving GPU requirements?
>>>>>>>>
>>>>>>>> Thank you,
>>>>>>>> Marco
>>>>>>>>
>>>>>>>>
>>>>>>>> Command outputs:
>>>>>>>>
>>>>>>>>
>>>>>>>> [root@ce-workspace /]# condor_q -all -better -reverse -machine slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxx 4
>>>>>>>>
>>>>>>>>
>>>>>>>> -- Schedd: ce-workspace.glideinwms.org : <10.89.0.35:46367?...
>>>>>>>>
>>>>>>>> -- Slot: slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxx : Analyzing matches for 1 Jobs in 1 autoclusters
>>>>>>>>
>>>>>>>> The Requirements expression for this slot is
>>>>>>>>
>>>>>>>> START &&
>>>>>>>> (WithinResourceLimits)
>>>>>>>>
>>>>>>>> START is
>>>>>>>> true
>>>>>>>>
>>>>>>>> WithinResourceLimits is
>>>>>>>> (MY.Cpus > 0 &&
>>>>>>>> TARGET.RequestCpus <= MY.Cpus && MY.Memory > 0 &&
>>>>>>>> TARGET.RequestMemory <= MY.Memory && MY.Disk > 0 &&
>>>>>>>> TARGET.RequestDisk <= MY.Disk && (TARGET.RequestGPUs is undefined ||
>>>>>>>> MY.GPUs >= TARGET.RequestGPUs))
>>>>>>>>
>>>>>>>> This slot defines the following attributes:
>>>>>>>>
>>>>>>>> Cpus = 1
>>>>>>>> Disk = 17978392
>>>>>>>> GPUs = 0
>>>>>>>> Memory = 1763
>>>>>>>>
>>>>>>>> Job 4.0 has the following attributes:
>>>>>>>>
>>>>>>>> TARGET.RequestCpus = 1
>>>>>>>> TARGET.RequestDisk = 100
>>>>>>>> TARGET.RequestGPUs = undefined
>>>>>>>> TARGET.RequestMemory = 2000
>>>>>>>>
>>>>>>>> The Requirements expression for this slot reduces to these conditions:
>>>>>>>>
>>>>>>>> Clusters
>>>>>>>> Step Matched Condition
>>>>>>>> ----- -------- ---------
>>>>>>>> [6] 0 TARGET.RequestMemory <= MY.Memory
>>>>>>>> [12] 1 TARGET.RequestGPUs is undefined
>>>>>>>>
>>>>>>>> slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxx: Run analysis summary of 1 jobs.
>>>>>>>> 0 (0.00 %) match both slot and job requirements.
>>>>>>>> 0 match the requirements of this slot.
>>>>>>>> 1 have job requirements that match this slot.
>>>>>>>> [root@ce-workspace /]# condor_q -all -better 4
>>>>>>>>
>>>>>>>>
>>>>>>>> -- Schedd: ce-workspace.glideinwms.org : <10.89.0.35:46367?...
>>>>>>>> The Requirements expression for job 4.000 is
>>>>>>>>
>>>>>>>> (RequestGpus ?: 0) >= (TARGET.Gpus ?: 0)
>>>>>>>>
>>>>>>>> Job 4.000 defines the following attributes:
>>>>>>>>
>>>>>>>> GlideinCpusIsGood = !isUndefined(MATCH_EXP_JOB_GLIDEIN_Cpus) && (int(MATCH_EXP_JOB_GLIDEIN_Cpus) =!= error)
>>>>>>>> JobGPUs = JobIsRunning ? int(MATCH_EXP_JOB_GLIDEIN_GPUs) : OriginalGPUs
>>>>>>>> JobIsRunning = (JobStatus =!= 1) && (JobStatus =!= 5) && GlideinCpusIsGood
>>>>>>>> JobStatus = 1
>>>>>>>> OriginalGPUs = undefined
>>>>>>>> RequestGpus = ifThenElse((WantWholeNode =?= true && OriginalGPUs =!= undefined),( !isUndefined(TotalGPUs) && TotalGPUs > 0) ? TotalGPUs : JobGPUs,OriginalGPUs)
>>>>>>>>
>>>>>>>> slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxx has the following attributes:
>>>>>>>>
>>>>>>>> TARGET.Gpus = 0
>>>>>>>> TARGET.TotalGPUs = 0
>>>>>>>>
>>>>>>>> The Requirements expression for job 4.000 reduces to these conditions:
>>>>>>>>
>>>>>>>> Slots
>>>>>>>> Step Matched Condition
>>>>>>>> ----- -------- ---------
>>>>>>>> [0] 0 TARGET.Gpus ?: 0
>>>>>>>> [1] 1 (RequestGpus ?: 0) >= (TARGET.Gpus ?: 0)
>>>>>>>>
>>>>>>>> No successful match recorded.
>>>>>>>> Last failed match: Wed Aug 7 22:45:27 2024
>>>>>>>>
>>>>>>>> Reason for last match failure: no match found
>>>>>>>>
>>>>>>>> 004.000: Run analysis summary ignoring user priority. Of 1 machines,
>>>>>>>> 0 are rejected by your job's requirements
>>>>>>>> 1 reject your job because of their own requirements
>>>>>>>> 0 match and are already running your jobs
>>>>>>>> 0 match but are serving other users
>>>>>>>> 0 are able to run your job
>>>>>>>>
>>>>>>>> WARNING: Be advised:
>>>>>>>> Job did not match any machines's constraints
>>>>>>>> To see why, pick a machine that you think should match and add
>>>>>>>> -reverse -machine <name>
>>>>>>>> to your query.
>>>>>>>>
>>>>>>>> [root@ce-workspace /]# condor_q -all -l 4 | grep -i gpu
>>>>>>>> AutoClusterAttrs = "MachineLastMatchTime,Offline,RemoteOwner,RequestCpus,RequestDisk,RequestGPUs,RequestMemory,TotalJobRuntime,ConcurrencyLimits,FlockTo,Rank,Requirements,DiskUsage,GlideinCpusIsGood,JobCpus,JobGPUs,JobIsRunning,JobMemory,JobStatus,MATCH_EXP_JOB_GLIDEIN_Cpus,MATCH_EXP_JOB_GLIDEIN_GPUs,MATCH_EXP_JOB_GLIDEIN_Memory,OriginalCpus,OriginalGPUs,OriginalMemory,TotalCpus,TotalGPUs,TotalMemory,WantWholeNode"
>>>>>>>> GlideinGPUsIsGood = !isUndefined(MATCH_EXP_JOB_GLIDEIN_GPUs) && (int(MATCH_EXP_JOB_GLIDEIN_GPUs) =!= error)
>>>>>>>> JOB_GLIDEIN_GPUs = "$$(ifThenElse(WantWholeNode is true, !isUndefined(TotalGPUs) ? TotalGPUs : JobGPUs, OriginalGPUs))"
>>>>>>>> JobGPUs = JobIsRunning ? int(MATCH_EXP_JOB_GLIDEIN_GPUs) : OriginalGPUs
>>>>>>>> OriginalGPUs = undefined
>>>>>>>> RequestGPUs = ifThenElse((WantWholeNode =?= true && OriginalGPUs =!= undefined),( !isUndefined(TotalGPUs) && TotalGPUs > 0) ? TotalGPUs : JobGPUs,OriginalGPUs)
>>>>>>>> Requirements = (RequestGpus ?: 0) >= (TARGET.Gpus ?: 0)
>>>>>>>> [root@ce-workspace /]# condor_ce_version
>>>>>>>> $HTCondorCEVersion: 23.0.8 $
>>>>>>>> $CondorVersion: 23.8.1 2024-06-27 BuildID: 742100 PackageID: 23.8.1-1 GitSHA: 8cf018d1 $
>>>>>>>> $CondorPlatform: x86_64_AlmaLinux9 $
>>>>>>>> [root@ce-workspace /]#
>>>>>>>> _______________________________________________
>>>>>>>> HTCondor-users mailing list
>>>>>>>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>>>>>>>> subject: Unsubscribe
>>>>>>>> You can also unsubscribe by visiting
>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.cs.wisc.edu_mailman_listinfo_htcondor-2Dusers&d=DwIFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=EF06-Wh4L9CNLgD8bnIjNQ&m=p0P_yFwa6Gf99JwL4Sgp1E33007ZsGmps9icaOG7j1wLA1LIjyqSILgof0vfB1hE&s=4z99ZPvqqUGaL9Mu02EwmlMB1GXnpnc90qd7dQOsyug&e=
>>>>>>>>
>>>>>>>> The archives can be found at:
>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.cs.wisc.edu_archive_htcondor-2Dusers_&d=DwIFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=EF06-Wh4L9CNLgD8bnIjNQ&m=p0P_yFwa6Gf99JwL4Sgp1E33007ZsGmps9icaOG7j1wLA1LIjyqSILgof0vfB1hE&s=qRvMMDCB5KF-T04yiJiOjBnBTKVfMk1ZVckSSzjZDKY&e=
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> HTCondor-users mailing list
>>>>>>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>>>>>>> subject: Unsubscribe
>>>>>>> You can also unsubscribe by visiting
>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.cs.wisc.edu_mailman_listinfo_htcondor-2Dusers&d=DwIGaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=EF06-Wh4L9CNLgD8bnIjNQ&m=6NA-IR9Hv5jfUZzSVhCLgx-GxtnxmsUc9iGcRoqS3wNZJsHWuRYDSNhV98-d1EGV&s=8v2IJLNY0SK3I7ZLF8DI6kpZRL3S8cntIxpb9jPaOtY&e=
>>>>>>>
>>>>>>> The archives can be found at:
>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.cs.wisc.edu_archive_htcondor-2Dusers_&d=DwIGaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=EF06-Wh4L9CNLgD8bnIjNQ&m=6NA-IR9Hv5jfUZzSVhCLgx-GxtnxmsUc9iGcRoqS3wNZJsHWuRYDSNhV98-d1EGV&s=x6m8L0s4H3rcMOMpJSHOjbdA6Wme429KkMemXAluHNg&e=
>>
>