On Wednesday, May 18, 2011 at 10:44 PM, Derrick Karimi wrote:
I am trying to add fault tolerance to my condor pool. I am attempting to retry jobs up to 5 times if they return a non zero ExitCode.using requirements in the submission file:== 0 || (ExitCode != 0 && JobRunCount >= 5)This is working on Windows 7 machines, but not on my Xp machines. Condor believes the return code of the failing jobs is always zero on the XP machines.A better way to say this would be:The command "C:\WINDOWS\system32\cmd.exe /Q /C condor_exec.bat" on Windows XP has an ERRORLEVEL of 0 and on Windows 7 an ERRORLEVEL of 1 (Windows return code speak).It isn't that Condor is always returning zero, it's that cmd.exe is always returning 0 on XP and Condor is just echo'ing this back to you.I have attached snippets from two StarterLogs, one on a Win7 slot, and one on an XP Slot. In each case I have logged onto the machine a job was running on and stimulated a failure in the same way. I have assured in my application logs, and job stdout log that the .bat file that is referenced as the command in the submit file is returning a non zero error code. I think I am returning error code from the .bat file in the "right" way.I am using Condor 7.2.5. Does anyone know if this was a bug that was fixed?Doubtful since there isn't likely a Condor bug here -- this looks a like a fundamental difference between cmd.exe on Windows XP and Windows 7. IIRC cmd.exe on XP was limited to be able to return 0 or 1 errorlevel codes. See: http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/ntcmds_shelloverview.mspx?mfr=trueThere's a note in that page that says:"If a command completes an operation successfully, it returns an exit code of zero (0) or no exit code."You can try some tests to convince yourself of this. If I havetest.bat:@echo offexit /b %1I can call it from a cmd prompt and see that it works:C:\tmp>.\test.bat 0C:\tmp>echo %ERRORLEVEL%0C:\tmp>.\test.bat 1C:\tmp>echo %ERRORLEVEL%1C:\tmp>.\test.bat 2C:\tmp>echo %ERRORLEVEL%2So that works, but now call it the same way Condor has to call it:C:\tmp>C:\WINDOWS\system32\cmd.exe /Q /C test.bat 0C:\tmp>echo %ERRORLEVEL%0C:\tmp>C:\WINDOWS\system32\cmd.exe /Q /C test.bat 1C:\tmp>echo %ERRORLEVEL%0C:\tmp>C:\WINDOWS\system32\cmd.exe /Q /C test.bat 2C:\tmp>echo %ERRORLEVEL%0Repeat on Windows 7 to see if cmd.exe has gotten better an echoing the error level of the last command it runs.It seems pretty critical, so perhaps there is some other explanation for the behavior I am seeing. I need some help. Is this the kind of grief I should expect when working with .bat files?More the kind of grief you should expect from working with Windows XP. Error levels aren't echoed by cmd.exe.Regards,- Ian
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/