Hi Dan,
Yes, gadget is the scheduler and the log was produced by that machine.
I took a look at the negotiator's log to see some trace of this
communication problem and I found this:
5/24 14:52:47 ---------- Started Negotiation Cycle ----------
5/24 14:52:47 Phase 1: Obtaining ads from collector ...
...
5/24 14:52:47 Negotiating with szabolcs@xxxxxxxxxxxxxxxxxxx at
<192.168.0.50:3661>
5/24 14:52:47 0 seconds so far
5/24 14:52:47 condor_read(): recv() returned -1, errno = 10054, assuming
failure reading 5 bytes from <192.168.0.50:3661>.
5/24 14:52:47 Failed to get reply from schedd
5/24 14:52:47 Error: Ignoring schedd for this cycle
5/24 14:52:47 ---------- Finished Negotiation Cycle ----------
I guess if the negotiator can negotiate with the computer using the IP
192.168.0.50 than it had to connect with it somehow.
Than what might cause the problem when waiting for the reply?
Cheers,
Szabolcs
Is gadget.digicpictures.local the name of the host that this SchedLog
was produced on? If so, then this sounds to me like the schedd trying
to directly claim its "local" startd, because it hasn't successfully
communicated with the negotiator for a long time. How long is
controlled by SCHEDD_ASSUME_NEGOTIATOR_GONE, which defaults to 1200 seconds.
--Dan
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/