We are at Condor version: 7.6.5 ( I understand a bit outdated, scheduled for upgrade some time soon)
Here I am in the process of conducting a workshop and hence want to restrict use of a bunch of nodes managed by a dedicated scheduler. And obviously I have failed. We have a constant use of these machines and I want to keep those user out, who have mastered continuous job submissions that run for a long time. For now I have peacefully shutdown condor so that I will have these nodes soon. But I need this restriction to work, to keep them out for 3 days.
ParallelSchedulingGroup = "PSGROUP"
DedicatedScheduler = "DedicatedScheduler@xxxxxxxxxxxxxxxxxx"
STARTD_EXPRS = $(STARTD_EXPRS), DedicatedScheduler, ParallelSchedulingGroup
RANK = (Owner == "testme000") + (Owner == "testme001")
START = (Scheduler =?= $(DedicatedScheduler)) && $(RANK)
I have the above machine ClassAd attributes specified on each of the machines managed by the dedicated scheduler.
Even thought it has been 21 hours since I submitted a job to schedule on these nodes, it is not accepting it. In fact by putting the above attributes had resorted the machine status to Owner:Idle for the last 21 hours. I am not logged into it remotely so I don't understand what is messing up its state?
When I check the condor_status machine requirements this is what I find below
$ condor_status -l node1 | grep Requirements
Requirements = ( START ) && ( IsValidCheckpointPlatform )
Also some other details:
$ condor_status -l node1 | grep -i owner
State = "Owner"
Start = ( Scheduler =?= "DedicatedScheduler@xxxxxxxxxxxxxxxxxx" ) && ( Owner == "testme000") + ( Owner == "testme001" )
TotalTimeOwnerIdle = 77565
Rank = ( Owner == "testme000") + ( Owner == "testme001" )
slot1_State = "Owner"
I am sure there is more to it. Any help is greatly appreciated.
Best,
Prem