HTCondor Project List Archives



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-devel] Bug: condor_release times out releasing clusters with more than 1500 processes in them



On 7/7/06, Peter Keller <psilord@xxxxxxxxxxx> wrote:
Hello,

On Thu, Jul 06, 2006 at 03:40:09PM -0400, Ian Chesal wrote:
> We saw this problem on our live system on the weekend so I had our co-op
> do a detailed analysis for this on our test system and it's very
> re-creatable. The condor_release call consistently times out on clusters
> with more than 1500 processes in them. Thankfully it doesn't
> half-release the cluster. But it still means that if you've submitted a
> cluster on hold with more than 1500 processes in it you're never going
> to get it to run. Is this a known issue?

I don't think that's ever been reported. I'll see if I can reproduce the error
and let you know what I find in a bit.

Note that as a work around condor_release with the constraint option
and doing chunks of 1000 at a time...

Matt