[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] "job disconnected" after being deleted



The -forcex is in some middleware used here. I think it was stuck in to make the code simpler (just run the -forcex on the job and move on because it's gone) but it obviously causes other problems. I think we only need to do the -forcex when the worker node is actually down and the middleware should be fixed to do it right.

Thanks,

joe

Ian Chesal wrote:


On Thursday, May 26, 2011 at 11:18 AM, Joe Boyd wrote:

Why does the below happen and how could I fix it? This is all cut out of the job.log for this dag job. You can see that it says it's being removed but then
it is still tryiing to contact it. Is this because of the "-forcex"??
Likely. -forcex removes the job from the scheduler machine without waiting to ensure the execute-node side of the job has shut down properly. The -forcex option should always be a last resort (and I'd say *only* run after you've tried condor_rm without the option against the job(s)) -- why are you using it in this case? Do you have issues with a straight condor_rm call?

Regards,
- Ian

--
Ian Chesal
ichesal@xxxxxxxxxxxxxxxxxx
http://www.cyclecomputing.com/


------------------------------------------------------------------------

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/