Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[HTCondor-users] problems with submission to Scheduler universe
- Date: Mon, 3 Jun 2024 18:25:36 +0200
- From: Stefano Belforte <stefano.belforte@xxxxxxx>
- Subject: [HTCondor-users] problems with submission to Scheduler universe
Dear experts,
can you explain why a JDL [1] with three lines in it stays idle forever ?
Universe = scheduler
requirements = true
RequestCpus = 2
inspecting the job requirements with condor_q -l, I find
Requirements = (true) && (TARGET.Arch == "X86_64") && (TARGET.OpSys ==
"LINUX") && (TARGET.Disk >= RequestDisk) && (TARGET.Memory >=
RequestMemory) && (TARGET.Cpus >= RequestCpus)
The same submission runs withing a few minutes if I put
RequestCpus = 2
and in that case the Requirements expression does not have the final
term && (TARGET.Cpus >= RequestCpus)
My scheduler is 16-cpu machine, so regardless of whether the
requirement makes sense or not, I'd expect the job to run, no ?
Documentation in
https://htcondor.readthedocs.io/en/latest/man-pages/condor_submit.html says
<quote>
For scheduler and local universe jobs, the requirements expression is
evaluated against the Scheduler ClassAd which represents the the
condor_schedd daemon running on the access point, rather than a remote
machine.
</quote>
But I did find a way to inspect those TARGET.* ads.
condor_status -sched -con 'machine=="vocms059.cern.ch"' -af Cpus Memory ...
simply returns "undefined". Same if I add TARGET.
Just lile "condor_q -af requirements" returns undefined
This is currently breaking CMS CRAB submission when moving to current
schedd.submit(submitObject,...) binding (we can go into details of how
it was working "before" if you care, but it is not relevant, IMHO).
I believe I can find a workaround by changing code in various places,
but if the above could be made to work, it would be the easiest.
Thanks
Stefano
[1] full JDL
Universe = scheduler
Executable = sleep.sh
Arguments = 1
Log = sleep.PC.log
Output = sleep.out.$(Cluster).$(Process)
Error = sleep.err.$(Cluster).$(Process)
requirements = true
should_transfer_files = YES
RequestMemory = 2000
RequestCpus = 2
when_to_transfer_output = ON_EXIT
Queue 1