Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] Dynamic Slots in Parallel Universe
- Date: Mon, 12 Mar 2018 11:11:28 -0500
- From: Todd Tannenbaum <tannenba@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] Dynamic Slots in Parallel Universe
On 3/9/2018 11:40 AM, Larne Pekowsky wrote:
Hi Todd,
Iâm resurrecting this thread because I think weâre still seeing related
problems. ÂOne of our users has a parallel universe job that has been
idle for almost a day. ÂThe StartLog on the available nodes seem to
indicate that the nodes are held for a wile and then released without
ever having enough nodes to start the job
[snip]>
Any suggestions? ÂIf you need any additional information please let me know.
Cheers,
- Larne
Hi Larne,
Look like your schedd is indeed running with Greg's v8.7.7 code patch here
https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=6517
so it should be working for you...
Does your condor_config on your central manager include
ALLOW_PSLOT_PREEMPTION = True
?
And the condor_config on all your execute nodes have a RANK expression
that prefers your dedicated scheduler submit machine? (e.g. like the
example at http://tinyurl.com/yaolvshk ) ?
If the answer to both of the above questions is yes, then the next step
is Greg will likely have more questions for you to get to the bottom of
this... After the above patch Greg observed parallel universe jobs
working here at UW with partitionable slots, so imagine he will need to
figure out what is different at Syracuse...
Thanks
Todd