Steven Timm wrote:
The other two features I've wanted for a long time are (1) an instruction to tell a schedd to start all its existing jobs but not accept any more new ones. Also (2) an instruction to let existing jobs on a schedd complete but not start any more new ones. (yes I know the latter could be accomplished with condor_hold -constraint ...)In Condor 7.1.1, condor_off -peaceful -schedd will cause the schedd to stop starting new jobs and shut down after all currently running jobs finish.A good start. Now is there a way to coordinate the -peaceful of the startd with the -peaceful of the schedd?
In 7.1.1, if you send a -peaceful shutdown to both startds and schedds, all existing running jobs should finish and then the schedds and startds should shut down, so I think there is no problem there. The lack of coordination is with the collector. If you send a -peaceful shutdown to everything (including the collector), then the collector will exit immediately, before the startds and schedds have finished. The lack of a collector _might_ not interfere with the wrapping up and shutting down of the rest of the pool, but I'm not 100% sure, and lacking a collector sure won't make it easy to see what is going on.
I believe the answer to your request (1) above is to set |MAX_JOBS_SUBMITTED=0. Or at least so says my new (not yet publicly announced) How-to:| http://nmi.cs.wisc.edu/node/1466We'll try this whenever we get around to testing condor 7.1.
FYI: MAX_JOBS_SUBMITTED has been lurking in obscurity for a long time. It should work in whatever version of Condor you are using. The general statement in the How-to that the advice is known to work in Condor 7.0 is just the standard thing I put in all the new How-to docs because I was in too much of a hurry to investigate how far back each feature goes.
--Dan