The schedd submits jobs to a large grid glidein pool, and the error happens only when a job is matched to a node in a specific site. The site has IPv6-only compute nodes, while our schedd machine does not support IPv6. We are not 100% sure that the issue is with the IP version, but that seems consistent with the exception (socket protocol != object protocol).Is this exception expected in such a case? And should the schedd crash?
The schedd should not crash if one of its jobs is (improperly)matched to a slot with which the schedd can not communicate. That being said, CCBClient::ReverseConnectCallback should only be being called after the reverse connection /succeeds/, so something strange is going on.
Could you send me the schedd log from before the stack trace? (If you can reproduce this easily, it'd be great to get the log with
SCHEDD_DEBUG set to "D_NETWORK D_FULLDEBUG".) Thanks. - ToddM