On 5/28/07, Ian Chesal <ICHESAL@xxxxxxxxxx> wrote:
So this error is now proving to be quite a problem for one large cluster in my system. Basically all the jobs in this cluster are causing this assertion in the shadow code when they start to run. Can someone with condor_shadow code access give me an idea of what might be causing this assert to get triggered?
Noting terribly obvious to me - it is asserting while trying to send the remote syscall to begin execution. if you could pull out the 94673.23 related log (it is mixed in with the 94812.0 logs in the snippet you provided) I would guess that it is linked to the errors listed above though. request to run REFUSED is normally not a good sign, it can mean that shadows are not gettring started fast enough so the startd times out the request. Have you tried upgrading the submit host to the latest 6.8? "Fixed a bug in the condor_ shadow on Windows where it would fail to correctly perform the PASSWORD authentication method." just a stab in the dark. Matt