Re: [HTCondor-devel] Schedd removes jobs when hold requested


Date: Thu, 02 Jul 2015 14:09:51 -0500
From: Todd Tannenbaum <tannenba@xxxxxxxxxxx>
Subject: Re: [HTCondor-devel] Schedd removes jobs when hold requested
On 7/2/2015 1:55 PM, Ben Cotton wrote:
Hi guys,

I'm pretty sure Ian Chesal brought this up with you a time or two
before, and I think I may have, too, but we saw an incident today
where CycleServer made a SOAP call to a scheduler to hold some jobs
and the scheduler removed them instead. It's the first time anyone has
told us about this in a while, so I'd hoped it had magically been
fixed.

Unfortunately, I don't have any logs (they'd already rotated), but if
it happens again, I'll try to pass them along.


Cool.

Any ideas?


By "the scheduler removed them", do you mean they ended up in condor_history in the "X" (removed) state, or do you mean they disappeared from the queue and thus perhaps ended up in condor_history in the "C" state? Wondering if there is a race condition here.

Perhaps there are policy expressions involved, like condor_config system_periodic_remove or job ad periodic_remove?

Todd

--
Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
Center for High Throughput Computing   Department of Computer Sciences
HTCondor Technical Lead                1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132                  Madison, WI 53706-1685
[← Prev in Thread] Current Thread [Next in Thread→]