[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Schedd dies with an exception when communicating with IPv6 startd



The schedd submits jobs to a large grid glidein pool, and the error happens only when a job is matched to a node in a specific site. The site has IPv6-only compute nodes, while our schedd machine does not support IPv6. We are not 100% sure that the issue is with the IP version, but that seems consistent with the exception (socket protocol != object protocol).

Is this exception expected in such a case? And should the schedd crash?

	The schedd should not crash if one of its jobs is (improperly)
matched to a slot with which the schedd can not communicate. That being said, CCBClient::ReverseConnectCallback should only be being called after the reverse connection /succeeds/, so something strange is going on.

Could you send me the schedd log from before the stack trace? (If you can reproduce this easily, it'd be great to get the log with
SCHEDD_DEBUG set to "D_NETWORK D_FULLDEBUG".)  Thanks.

- ToddM