On 7/2/2015 1:55 PM, Ben Cotton wrote:
Hi guys,
I'm pretty sure Ian Chesal brought this up with you a time or two
before, and I think I may have, too, but we saw an incident today
where CycleServer made a SOAP call to a scheduler to hold some jobs
and the scheduler removed them instead. It's the first time anyone has
told us about this in a while, so I'd hoped it had magically been
fixed.
Unfortunately, I don't have any logs (they'd already rotated), but if
it happens again, I'll try to pass them along.
Cool.
Any ideas?
By "the scheduler removed them", do you mean they ended up in
condor_history in the "X" (removed) state, or do you mean they
disappeared from the queue and thus perhaps ended up in condor_history
in the "C" state? Wondering if there is a race condition here.
Perhaps there are policy expressions involved, like condor_config
system_periodic_remove or job ad periodic_remove?
Todd
--
Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
Center for High Throughput Computing Department of Computer Sciences
HTCondor Technical Lead 1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132 Madison, WI 53706-1685
|