This looks like some sloppy cleanup code when the starter exits in certain error conditions. It lost its network connection the shadow daemon on the submit machine during file transfer. It was then told to exit, which precipitated jumbled cleanup
code that included trying to tell the shadow it was exiting (over a dead connection). I havenât trace the exact cause of the crash, but the code involved has several problems that need to be fixed.
The startd should recover from the crash without any assistance. If these crashes are only happening rarely, then I wouldnât worry about them. We will work on a fix, though.
Thanks and regards,
Jaime Frey
UW-Madison HTCondor Project
|