Hi,
The analysis of a long-waiting job:
-- Schedd: taai-007.nikhef.nl : <145.107.7.246:9618?...
The Requirements _expression_ for job 214860.000 is
(Machine != "wn-pijl-002.nikhef.nl") && (Machine != "wn-lot-001.nikhef.nl")
Job 214860.000 defines the following attributes:
The Requirements _expression_ for job 214860.000 reduces to these conditions:
Slots
Step Matched Condition
----- -------- ---------
[0] 867 Machine != "wn-pijl-002.nikhef.nl"
[1] 857 Machine != "wn-lot-001.nikhef.nl"
[2] 851 [0] && [1]
No successful match recorded.
Last failed match: Thu Jul 11 13:12:52 2024
Reason for last match failure: no match found
214860.000: Run analysis summary ignoring user priority. Of 86 machines,
2 are rejected by your job's requirements
31 reject your job because of their own requirements
0 match and are already running your jobs
0 match but are serving other users
53 are able to run your job
The job is asking for 64 cores, there are 57 with 64 cores, two of them are rejected by [0] and [1], and two more are draining, so 53 are âtheoreticallyâ able to run my job, if were not for all the jobs due to other users already running on those nodes. There ARE, however, single core slots available on all 53 of those nodes - itâs as if it makes the comparison RequestCpus vs TotalCpus, but then does not make the comparison per actual slot (we have partitionable slots) on RequestCpus vs Cpus â and strange that Cpus are not mentioned as part of the analysis.
Is this a feature, a bug, or a misconfiguration on our part?
JT
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/