Hello,
So, it looks like your job has matched. Could you send my the condor logs from the Execution Point off list?
I think that /var/log/condor/VMGahpLog and /var/log/condor/StartLog would be most interesting.
...Tim
Hi,here is the analysis you requested:
$ condor_q -better-analyze 63.0 -reverse -machine ep1ext.sel
-- Schedd: t450.sel : <10.10.0.47:9618?...
-- Slot: slot1@xxxxxxxxxx : Analyzing matches for 1 Jobs in 1 autoclusters
The Requirements _expression_ for this slot is
START &&
(WithinResourceLimits)
START is
true
WithinResourceLimits is
(MY.Cpus > 0 &&
TARGET.RequestCpus <= MY.Cpus && MY.Memory > 0 &&
TARGET.RequestMemory <= MY.Memory && MY.Disk > 0 &&
TARGET.RequestDisk <= MY.Disk)
This slot defines the following attributes:
Cpus = 12
Disk = 11642268
Memory = 32130
Job 63.0 has the following attributes:
TARGET.RequestCpus = 2
TARGET.RequestDisk = 4250000
TARGET.RequestMemory = 4096
The Requirements _expression_ for this slot reduces to these conditions:
Clusters
Step Matched Condition
----- -------- ---------
[1] 1 WithinResourceLimits
slot1@xxxxxxxxxx: Run analysis summary of 1 jobs.
1 (100.00 %) match both slot and job requirements.
1 match the requirements of this slot.
1 have job requirements that match this slot.
$ condor_statusName OpSys Arch State Activity LoadAv Mem ActvtyTime
slot1@xxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 32130 0+01:39:45
Total Owner Claimed Unclaimed Matched Preempting Drain Backfill BkIdle
X86_64/LINUX 1 0 0 1 0 0 0 0 0
Total 1 0 0 1 0 0 0 0 0
On Tue, 2023-12-05 at 08:37 -0600, Tim Theisen via HTCondor-users wrote:Hello VB,
Perhaps, you could try the reverse analyze to see if there is something preventing the job from starting.
condor_q -better-analyze 62.0 -reverse -machine ep1ext.sel
Also, what does "condor_status" produce?
...Tim
On 12/5/23 07:39, Valerio Bellizzomi wrote:
Hi Tim,finally I got the slot1 to match my job, but for some unknown reason the job still remains in idle state:
$ condor_q -better-analyze
-- Schedd: t450.sel : <10.10.0.47:9618?...
The Requirements _expression_ for job 62.000 is
((Machine == "ep1ext.sel")) && (TARGET.Arch == "X86_64") && (TARGET.HasVM is true) && (TARGET.VM_Type == MY.JobVMType) && (TARGET.VM_AvailNum > 0) &&
(TARGET.Disk >= RequestDisk) && (TARGET.TotalMemory >= MY.JobVMMemory) && (TARGET.VM_Memory >= MY.JobVMMemory) && (TARGET.Cpus >= RequestCpus) && (TARGET.HasFileTransfer)
Job 62.000 defines the following attributes:
DiskUsage = 4250000
JobVMMemory = 4096
JobVMType = "kvm"
RequestCpus = 2
RequestDisk = DiskUsage
slot1@xxxxxxxxxx has the following attributes:
TARGET.Arch = "X86_64"
TARGET.Cpus = 12
TARGET.Disk = 11642268
TARGET.HasFileTransfer = true
TARGET.HasVM = true
TARGET.Machine = "ep1ext.sel"
TARGET.TotalMemory = 32130
TARGET.VM_AvailNum = 4
TARGET.VM_Memory = 30000
TARGET.VM_Type = "kvm"
The Requirements _expression_ for job 62.000 reduces to these conditions:
Slots
Step Matched Condition
----- -------- ---------
[0] 1 Machine == "ep1ext.sel"
[1] 1 TARGET.Arch == "X86_64"
[3] 1 TARGET.HasVM is true
[5] 1 TARGET.VM_Type == MY.JobVMType
[7] 1 TARGET.VM_AvailNum > 0
[9] 1 TARGET.Disk >= RequestDisk
[11] 1 TARGET.TotalMemory >= MY.JobVMMemory
[13] 1 TARGET.VM_Memory >= MY.JobVMMemory
[15] 1 TARGET.Cpus >= RequestCpus
[17] 1 TARGET.HasFileTransfer
062.000: Run analysis summary ignoring user priority. Of 1 machines,
0 are rejected by your job's requirements
0 reject your job because of their own requirements
0 match and are already running your jobs
0 match but are serving other users
1 are able to run your job
$ condor_q
-- Schedd: t450.sel : <10.10.0.47:9618?... @ 12/05/23 14:28:16
OWNER BATCH_NAME SUBMITTED DONE RUN IDLE TOTAL JOB_IDS
sel ID: 62 12/5 14:27 _ _ 1 1 62.0
Total for query: 1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended
Total for sel: 1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended
Total for all users: 1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended
--Tim Theisen (he, him, his)Release ManagerHTCondor & Open Science GridCenter for High Throughput ComputingDepartment of Computer SciencesUniversity of Wisconsin - Madison4261 Computer Sciences and Statistics1210 W Dayton StMadison, WI 53706-1685+1 608 265 5736_______________________________________________HTCondor-users mailing listTo unsubscribe, send a message tohtcondor-users-request@xxxxxxxxxxxwith asubject: UnsubscribeYou can also unsubscribe by visitinghttps://lists.cs.wisc.edu/mailman/listinfo/htcondor-usersThe archives can be found at:https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________ HTCondor-users mailing list To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users The archives can be found at: https://lists.cs.wisc.edu/archive/htcondor-users/
-- Tim Theisen (he, him, his) Release Manager HTCondor & Open Science Grid Center for High Throughput Computing Department of Computer Sciences University of Wisconsin - Madison 4261 Computer Sciences and Statistics 1210 W Dayton St Madison, WI 53706-1685 +1 608 265 5736