[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] condor_q -analyze and -better-analyze : what am I missing?



Thanks Christoph, 

That version of the analysis, I donât understand at all.

What is reported for TARGET.RequestCpus is the same as the slot Cpus â

$ condor_q -allusers -better-analyze 214860.0 -reverse | grep TARGET.RequestCpus | sort| uniq -c                                                                                       838 [1]           0  ifThenElse(TARGET._cp_orig_RequestCpus isnt undefined,TARGET.RequestCpus <= MY.Cpus,MY.ConsumptionCpus <= MY.Cpus)
      1 [1]           1  ifThenElse(TARGET._cp_orig_RequestCpus isnt undefined,TARGET.RequestCpus <= MY.Cpus,MY.ConsumptionCpus <= MY.Cpus)
     11 [2]           0  TARGET.RequestCpus <= MY.Cpus
    857     (ifThenElse(TARGET._cp_orig_RequestCpus isnt undefined,TARGET.RequestCpus <= MY.Cpus,MY.ConsumptionCpus <= MY.Cpus) &&
    695     TARGET.RequestCpus = 1
      2     TARGET.RequestCpus = 16
     12     TARGET.RequestCpus = 32
     72     TARGET.RequestCpus = 50
     87     TARGET.RequestCpus = 64
     11       TARGET.RequestCpus <= MY.Cpus && MY.Memory > 0 &&

It also still reports 1 slot can run that job, although that slot certainly cannot as the true RequestCpus is 64, whereas that slot only has 1 Cpu.

JT


On 11 Jul 2024, at 13:30, Beyer, Christoph <christoph.beyer@xxxxxxx> wrote:

Hi Jeff,

have you tried

condor_q -better-analyze 214860.000 -reverse -machine <one of the 53 hosts> ?  The host needs to be FQDN for some reason or slot@FQDN ...


Best
christoph

--
Christoph Beyer
DESY Hamburg
IT-Department

Notkestr. 85
Building 02b, Room 009
22607 Hamburg

phone:+49-(0)40-8998-2317
mail: christoph.beyer@xxxxxxx


Von: "Jeff Templon" <templon@xxxxxxxxx>
An: "HTCondor-Users Mail List" <htcondor-users@xxxxxxxxxxx>
Gesendet: Donnerstag, 11. Juli 2024 13:19:26
Betreff: [HTCondor-users] condor_q -analyze and -better-analyze : what am I        missing?

Hi,
The analysis of a long-waiting job:

-- Schedd: taai-007.nikhef.nl : <145.107.7.246:9618?...
The Requirements _expression_ for job 214860.000 is

    (Machine != "wn-pijl-002.nikhef.nl") && (Machine != "wn-lot-001.nikhef.nl")

Job 214860.000 defines the following attributes:


The Requirements _expression_ for job 214860.000 reduces to these conditions:

         Slots
Step    Matched  Condition
-----  --------  ---------
[0]         867  Machine != "wn-pijl-002.nikhef.nl"
[1]         857  Machine != "wn-lot-001.nikhef.nl"
[2]         851  [0] && [1]

No successful match recorded.
Last failed match: Thu Jul 11 13:12:52 2024

Reason for last match failure: no match found

214860.000:  Run analysis summary ignoring user priority.  Of 86 machines,
      2 are rejected by your job's requirements
     31 reject your job because of their own requirements
      0 match and are already running your jobs
      0 match but are serving other users
     53 are able to run your job

The job is asking for 64 cores, there are 57 with 64 cores, two of them are rejected by [0] and [1], and two more are draining, so 53 are âtheoreticallyâ able to run my job, if were not for all the jobs due to other users already running on those nodes.  There ARE, however, single core slots available on all 53 of those nodes - itâs as if it makes the comparison RequestCpus vs TotalCpus, but then does not make the comparison per actual slot (we have partitionable slots) on RequestCpus vs Cpus â and strange that Cpus are not mentioned as part of the analysis.

Is this a feature, a bug, or a misconfiguration on our part?

JT


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/