Opps! I typed that wrong. You want to actually change:
JOB_START_COUNT = 2
I said ‘JOB_START_INTERVAL’ -- that’s wrong. I think faster than I can type sometimes...
- Ian
From:
condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Ian Chesal
That’s it! That’s the key: the jobs run very quickly (I’m guessing in the range of a few minutes, right?).
In that case condor can’t spawn shadows fast enough. The shadow spawn rate on the schedds is throttled to prevent overloading the machine by starting many, many processes at the same time. There are two variables that control the spawn rate. You’ll only want to change JOB_START_COUNT.
Put this in the condor_config file used by all your schedds:
## Start more than one job at a time JOB_START_INTERVAL = 2
Once that’s deployed in all your condor_config files issue:
condor_reconfig -all
From your central negotiator to reconfigure all of them.
You can up that number until the Claimed+Idle machines disappear but keep a careful on CPU usage on your schedd machines. It can spike spawing too many shadow processes at once.
- Ian
0 jobs; 0 idle, 0 running, 0 held on any machine that I try. I think by the time I SSH to a node thats running a job Its already finished hence the empty queue. The jobs run very quickly |