Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] pool drainoff

Date: Mon, 28 Apr 2008 14:07:57 -0700
From: Michael Thomas <thomas@xxxxxxxxxxxxxxx>
Subject: Re: [Condor-users] pool drainoff

Hi Steve,

Steven Timm wrote:

How can I put a single node in a condor pool into a
'drainoff' state, that is, let any jobs currently running on
the node finish, but don't accept new jobs.

It should be:

	condor_off -peaceful

In theory that will shut down the machines once all the running jobs
leave. In practice I find if one job takes an incredibly long time to
run new jobs keep getting assigned to the machine and a peaceful point
to shut down is never reached. That's with 6.8.6 (yea, Condor guys, I
know: why don't I tell you about these things? Sometimes it just slips
my mind... :) ).



In practice I've found two gotchas with this approach
(1) you have to execute condor_off -peaceful individually
for each startd in the pool.   If you just do a global
condor_off -peaceful it will kill the schedd's and negotiators
well before the startd's go off and you won't have the
desired result.  (the jobs will all finish but condor
will never know about it).  They need a feature added to

automatically do the startd's first and then the schedd's andcollector/negotiators.

I think the subject was a bit misleading. I really meant 'nodedrainoff', not 'pool drainoff'. I'm confusing the condor and dCacheconcepts of a 'pool'.

(2) If you execute condor_off -peaceful for a lot of nodes
in rapid succession it will send the collector into a dance of death
from which it can take hours to extract itself and condor_status
will time out in the meantime.  Supposedly that will
be fixed in condor 7.0.2.

The other two features I've wanted for a long time are (1) an instruction
to tell a schedd to start all its existing jobs but not
accept any more new ones.  Also (2) an instruction to let existing
jobs on a schedd complete but not start any more new ones.  (yes
I know the latter could be accomplished with condor_hold -constraint ...)

(2) is precisely the feature I was trying to use, except on a startd,not a schedd. If it's not currently possible with 7.0.0, then I'll justhave to continue the tedious practice of watching for specific nodes tobecome idle, then shutting condor off. Otherwise I could just shutcondor off while jobs are running, but I don't like to kill jobs thathave been running for several hours.


--Mike

I thought I could do this by setting 'START=False' in the
node-specific condor_config.local, followed by
'condor_reconfig -subsystem startd' on the node, but that
doesn't seem to have worked.  The node is still starting new jobs.

Hmm...try:

	condor_reconfig -startd -full

But my gut feeling that is that START = False is going to immediately
vacate the running jobs.

- Ian


Confidentiality Notice.  This message may contain information that is confidential or otherwise protected from disclosure.
If you are not the intended recipient, you are hereby notified that any use, disclosure, dissemination, distribution,
or copying of this message, or any attachments, is strictly prohibited.  If you have received this message in error,
please advise the sender by reply e-mail, and delete the message and any attachments.  Thank you.



_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:https://lists.cs.wisc.edu/archive/condor-users/

Follow-Ups:
- Re: [Condor-users] pool drainoff
  - From: Steven Timm

References:
- [Condor-users] pool drainoff
  - From: Michael Thomas
- Re: [Condor-users] pool drainoff
  - From: Ian Chesal
- Re: [Condor-users] pool drainoff
  - From: Steven Timm

Prev by Date: Re: [Condor-users] pool drainoff
Next by Date: Re: [Condor-users] pool drainoff
Previous by thread: Re: [Condor-users] pool drainoff
Next by thread: Re: [Condor-users] pool drainoff
Index(es):
- Date
- Thread

Mailing List Archives

Authenticated access

Re: [Condor-users] pool drainoff