From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Jaime Frey <jfrey@xxxxxxxxxxx>
Sent: Wednesday, June 8, 2022, 17:26
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Cc: Condor-Users Mail List <condor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] Increase memory on release
I’ve tried this on my own machine (9.0.14), and MEMORY values print as expected for condor_q -autocluster. You can add the -long flag to see which attributes the schedd is returning to condor_q:
% condor_q -autocluster -long
AutoClusterId = 2
DiskUsage = 5
JobCount = 4
JobIds = "2101.0 ... 2101.3"
Rank = 0.0
RequestCpus = 1
RequestDisk = DiskUsage
RequestMemory = ifthenelse(((LastHoldReasonCode != 34) || IsUndefined(Memoryshmovisioned)),2048,4096)
Requirements = (false) && (TARGET.Arch == "arm64") && (TARGET.OpSys == "macOS") && (TARGET.Disk >= RequestDisk) && (TARGET.Memory >= RequestMemory) && (TARGET.HasFileTransfer)
ServerTime = 1654698108
You can alternatively add the -af flag to see how the RequestMemory attribute evaluates in the returned autocluster ad:
% condor_q -autocluster -af requestmemory
2048
I did notice a problem where the job universe isn’t always returned by the schedd, which should be a simple fix.
- Jaime
Hi Jaime,
I think it's only a display issue with condor_q --autocluster
As you can see bellow it's an autocluster of 15 jobs and and the Negotiator treat this as a single autocluster.
Using condor_version 9.0.1
Thanks
David.
Submit lines:
InitialMemorySize = 2048
IncreasedMemorySize = 4096
RequestMemory = ifthenelse(((LastHoldReasonCode != 34) || IsUndefined(Memoryshmovisioned)), $(InitialMemorySize), $(IncreasedMemorySize))
periodic_release = (JobStatus == 5) && (HoldReasonCode == 34) && (Memoryshmovisioned <= $(IncreasedMemorySize))
--------------------------------
condor_q --autocluster output:
-- Schedd: fleetnetworks-SBMT01.ORG.fleetnetworks.gorgo : <192.20.9.70:20123?... @ 05/29/22 10:04:31
ID COUNT UINVERSE CPUS MEMORY DISK REQUIREMENTS
87 0 Vanilla 1 [????] 15360 TARGET.HasDocker && (TARGET.Disk >= RequestDisk) && (TARGET.Memory >= RequestMemory) && (TARGET.HasFileTransfer)
dudu@fleetnetworks-sbmt01:~$
---------------------------------
Negotiator log:
05/29/22 10:04:20 Negotiating with guest.dudu@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx at <192.20.9.70:20123?addrs=192.20.9.70-20123&alias=fleetnetworks-SBMT01.ORG.fleetnetworks.gorgo&noUDP&sock=schedd_2426173_74c0>
05/29/22 10:04:20 0 seconds so far for this submitter
05/29/22 10:04:20 0 seconds so far for this schedd
05/29/22 10:04:20 Request 02537.00000: autocluster 86 (request count 1 of 15)
05/29/22 10:04:20 Matched 2537.0 guest.dudu@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <192.20.9.70:20123?addrs=192.20.9.70-20123&alias=fleetnetworks-SBMT01.ORG.fleetnetworks.gorgo&noUDP&sock=schedd_2426173_74c0> preempting none <192.3.39.26:9618?addrs=192.3.39.26-9618&alias=shmo-server75.gorgo&noUDP&sock=startd_6181_3af9> slot1@xxxxxxxxxxxxxxxxxxx
05/29/22 10:04:20 Successfully matched with slot1@xxxxxxxxxxxxxxxxxxx
05/29/22 10:04:20 Request 02537.00000: autocluster 86 (request count 2 of 15)
05/29/22 10:04:20 Matched 2537.0 guest.dudu@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <192.20.9.70:20123?addrs=192.20.9.70-20123&alias=fleetnetworks-SBMT01.ORG.fleetnetworks.gorgo&noUDP&sock=schedd_2426173_74c0> preempting none <192.3.29.56:9618?addrs=192.3.29.56-9618&alias=shmo-server1074.gorgo&noUDP&sock=startd_16567_2999> slot1@xxxxxxxxxxxxxxxxxxxxx
05/29/22 10:04:20 Successfully matched with slot1@xxxxxxxxxxxxxxxxxxxxx
05/29/22 10:04:20 Request 02537.00000: autocluster 86 (request count 3 of 15)
05/29/22 10:04:20 Matched 2537.0 guest.dudu@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <192.20.9.70:20123?addrs=192.20.9.70-20123&alias=fleetnetworks-SBMT01.ORG.fleetnetworks.gorgo&noUDP&sock=schedd_2426173_74c0> preempting none <192.3.32.131:9618?addrs=192.3.32.131-9618&alias=shmo-server95.gorgo&noUDP&sock=startd_1208_2a7e> slot1@xxxxxxxxxxxxxxxxxxx
05/29/22 10:04:20 Successfully matched with slot1@xxxxxxxxxxxxxxxxxxx
05/29/22 10:04:20 Request 02537.00000: autocluster 86 (request count 4 of 15)
05/29/22 10:04:20 Matched 2537.0 guest.dudu@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <192.20.9.70:20123?addrs=192.20.9.70-20123&alias=fleetnetworks-SBMT01.ORG.fleetnetworks.gorgo&noUDP&sock=schedd_2426173_74c0> preempting none <192.3.53.112:9618?addrs=192.3.53.112-9618&alias=shmo-server10.gorgo&noUDP&sock=startd_12323_21a9> slot1@xxxxxxxxxxxxxxxxxxx
05/29/22 10:04:20 Successfully matched with slot1@xxxxxxxxxxxxxxxxxxx
05/29/22 10:04:20 Request 02537.00000: autocluster 86 (request count 5 of 15)
05/29/22 10:04:20 Matched 2537.0 guest.dudu@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <192.20.9.70:20123?addrs=192.20.9.70-20123&alias=fleetnetworks-SBMT01.ORG.fleetnetworks.gorgo&noUDP&sock=schedd_2426173_74c0> preempting none <192.3.21.177:9618?addrs=192.3.21.177-9618&alias=shmo-server1086.gorgo&noUDP&sock=startd_24598_c2b0> slot1@xxxxxxxxxxxxxxxxxxxxx
05/29/22 10:04:20 Successfully matched with slot1@xxxxxxxxxxxxxxxxxxxxx
05/29/22 10:04:20 Request 02537.00000: autocluster 86 (request count 6 of 15)
05/29/22 10:04:20 Matched 2537.0 guest.dudu@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <192.20.9.70:20123?addrs=192.20.9.70-20123&alias=fleetnetworks-SBMT01.ORG.fleetnetworks.gorgo&noUDP&sock=schedd_2426173_74c0> preempting none <192.3.106.141:9618?addrs=192.3.106.141-9618&alias=GLUS171087.gorgo&noUDP&sock=startd_31737_854c> slot1@xxxxxxxxxxxxxxxx
05/29/22 10:04:20 Successfully matched with slot1@xxxxxxxxxxxxxxxx
05/29/22 10:04:20 Request 02537.00000: autocluster 86 (request count 7 of 15)
05/29/22 10:04:20 Matched 2537.0 guest.dudu@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <192.20.9.70:20123?addrs=192.20.9.70-20123&alias=fleetnetworks-SBMT01.ORG.fleetnetworks.gorgo&noUDP&sock=schedd_2426173_74c0> preempting none <192.3.30.44:9618?addrs=192.3.30.44-9618&alias=shmo-server152.gorgo&noUDP&sock=startd_432_e500> slot1@xxxxxxxxxxxxxxxxxxxx
05/29/22 10:04:20 Successfully matched with slot1@xxxxxxxxxxxxxxxxxxxx
05/29/22 10:04:20 Request 02537.00000: autocluster 86 (request count 8 of 15)
05/29/22 10:04:20 Matched 2537.0 guest.dudu@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <192.20.9.70:20123?addrs=192.20.9.70-20123&alias=fleetnetworks-SBMT01.ORG.fleetnetworks.gorgo&noUDP&sock=schedd_2426173_74c0> preempting none <192.3.3.69:9618?addrs=192.3.3.69-9618&alias=NIR-SSD029.gorgo&noUDP&sock=startd_25279_deaf> slot1@xxxxxxxxxxxxxxxx
05/29/22 10:04:20 Successfully matched with slot1@xxxxxxxxxxxxxxxx
05/29/22 10:04:20 Request 02537.00000: autocluster 86 (request count 9 of 15)
05/29/22 10:04:20 Matched 2537.0 guest.dudu@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <192.20.9.70:20123?addrs=192.20.9.70-20123&alias=fleetnetworks-SBMT01.ORG.fleetnetworks.gorgo&noUDP&sock=schedd_2426173_74c0> preempting none <192.3.63.176:9618?addrs=192.3.63.176-9618&alias=NIR-069.gorgo&noUDP&sock=startd_6202_a276> slot1@xxxxxxxxxxxxx
05/29/22 10:04:20 Successfully matched with slot1@xxxxxxxxxxxxx
05/29/22 10:04:20 Request 02537.00000: autocluster 86 (request count 10 of 15)
05/29/22 10:04:20 Matched 2537.0 guest.dudu@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <192.20.9.70:20123?addrs=192.20.9.70-20123&alias=fleetnetworks-SBMT01.ORG.fleetnetworks.gorgo&noUDP&sock=schedd_2426173_74c0> preempting none <192.3.59.148:9618?addrs=192.3.59.148-9618&alias=shmo-server1149.gorgo&noUDP&sock=startd_12655_db1a> slot1@xxxxxxxxxxxxxxxxxxxxx
05/29/22 10:04:21 Successfully matched with slot1@xxxxxxxxxxxxxxxxxxxxx
05/29/22 10:04:21 Request 02537.00000: autocluster 86 (request count 11 of 15)
05/29/22 10:04:21 Matched 2537.0 guest.dudu@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <192.20.9.70:20123?addrs=192.20.9.70-20123&alias=fleetnetworks-SBMT01.ORG.fleetnetworks.gorgo&noUDP&sock=schedd_2426173_74c0> preempting none <192.3.64.122:9618?addrs=192.3.64.122-9618&alias=NIR-SSD017.gorgo&noUDP&sock=startd_27893_db3f> slot1@xxxxxxxxxxxxxxxx
05/29/22 10:04:21 Successfully matched with slot1@xxxxxxxxxxxxxxxx
05/29/22 10:04:21 Request 02537.00000: autocluster 86 (request count 12 of 15)
05/29/22 10:04:21 Matched 2537.0 guest.dudu@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <192.20.9.70:20123?addrs=192.20.9.70-20123&alias=fleetnetworks-SBMT01.ORG.fleetnetworks.gorgo&noUDP&sock=schedd_2426173_74c0> preempting none <192.3.3.73:9618?addrs=192.3.3.73-9618&alias=NIR-SSD024.gorgo&noUDP&sock=startd_27636_9368> slot1@xxxxxxxxxxxxxxxx
05/29/22 10:04:21 Successfully matched with slot1@xxxxxxxxxxxxxxxx
05/29/22 10:04:21 Request 02537.00000: autocluster 86 (request count 13 of 15)
05/29/22 10:04:21 Matched 2537.0 guest.dudu@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <192.20.9.70:20123?addrs=192.20.9.70-20123&alias=fleetnetworks-SBMT01.ORG.fleetnetworks.gorgo&noUDP&sock=schedd_2426173_74c0> preempting none <192.3.47.15:9618?addrs=192.3.47.15-9618&alias=shmo-server1168.gorgo&noUDP&sock=startd_13945_b046> slot1@xxxxxxxxxxxxxxxxxxxxx
05/29/22 10:04:21 Successfully matched with slot1@xxxxxxxxxxxxxxxxxxxxx
05/29/22 10:04:21 Request 02537.00000: autocluster 86 (request count 14 of 15)
05/29/22 10:04:21 Matched 2537.0 guest.dudu@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <192.20.9.70:20123?addrs=192.20.9.70-20123&alias=fleetnetworks-SBMT01.ORG.fleetnetworks.gorgo&noUDP&sock=schedd_2426173_74c0> preempting none <192.3.3.71:9618?addrs=192.3.3.71-9618&alias=NIR-SSD036.gorgo&noUDP&sock=startd_7313_b02e> slot1@xxxxxxxxxxxxxxxx
05/29/22 10:04:21 Successfully matched with slot1@xxxxxxxxxxxxxxxx
05/29/22 10:04:21 Request 02537.00000: autocluster 86 (request count 15 of 15)
05/29/22 10:04:21 Matched 2537.0 guest.dudu@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <192.20.9.70:20123?addrs=192.20.9.70-20123&alias=fleetnetworks-SBMT01.ORG.fleetnetworks.gorgo&noUDP&sock=schedd_2426173_74c0> preempting none <192.3.30.127:9618?addrs=192.3.30.127-9618&alias=shmo-server116.gorgo&noUDP&sock=startd_11345_4e09> slot1@xxxxxxxxxxxxxxxxxxxx
Thanks Jamie.
I will recreate this in the lab and provide the information.
Thanks
David
Can you tell us the _expression_ you’re using, and the command and output with question marks?
- Jaime
Hi All.
I'm trying to increase memory on failed jobs.
It's actually working using if statement on requested memory.
But the auto cluster is unable to handle it.
While looking at autocluster queue it displays question marks at the memory column.
I personally think it's a very important feature.
Thanks
David
_______________________________________________
HTCondor-users
mailing list
To
unsubscribe, send a message to
htcondor-users-request@xxxxxxxxxxx with a
subject:
Unsubscribe
You
can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The
archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/