On Tue, 5 Oct 2010, Carsten Aulbert wrote:
Hi, I've found an email thread from 2005 discussing this https://www-auth.cs.wisc.edu/lists/condor-users/2005-February/msg00373.shtml Is this possible nowadays - I have a long running dagman here which currently runs up to 500 jobs at once, but the file servers can possibly go up to 1000or beyond and I would like to increase this number without restarting the 120k dagman.
There's not really a "clean" way to do this, but depending on how maxjobs was specified, there are some things you can do.
If your maxjobs limit is specified in a per-DAG config file, you could edit the config file, and then do condor_hold and condor_release on the DAG job itself. That would cause DAGMan to restart and go into recovery mode, but in the mean time, any node jobs already in the queue would continue.
If maxjobs is specified in the command-line arguments, I guess you could do something like condor_hold, then condor_qedit to change the arguments, and condor_release.
Finally, you can do condor_rm of the DAGMan job, and re-submit it with a different maxjobs setting to run the resulting rescue DAG; but that will remove all running node jobs, so you'll waste some work.
Kent Wenger Condor Team