Hi Dan,
It all looks sensible. So on the submit node running 7.4.2:
$ condor_config_val ALLOW_NEGOTIATOR
tempo--escience.grid.private.cam.ac.uk
$ condor_config_val ALLOW_NEGOTIATOR_SCHEDD
tempo--escience.grid.private.cam.ac.uk,
These are exactly the same values I get for HOSTALLOW_NEGOTIATOR
and HOSTALLOW_NEGOTIATOR_SCHEDD on the successful submit node
running 7.2.5.
From nslookup we see that this Negotiator is the very host that gets
mentioned in the 7.4.2 submit host's SchedLog as being the unknown
Negotiator:
###
$ nslookup tempo--escience.grid.private.cam.ac.uk
Server: 131.111.8.42
Address: 131.111.8.42#53
Name: tempo--escience.grid.private.cam.ac.uk
Address: 172.24.116.1
###
Any ideas? In the meantime I'll keep digging in case the penny drops.
Cheers,
Mark
ps. I'm running on 32-bit Debian Lenny machines, using the
dynamically-linked x86 debian50 build from the Wisconsin repository.
On 13/04/2010 18:38, Dan Bradley wrote:
Mark,
Check the configuration of ALLOW_NEGOTIATOR and
ALLOW_NEGOTIATOR_SCHEDD in the configuration of the submit machine.
Let me know if it still doesn't make sense.
--Dan
Mark Calleja wrote:
Hi,
I'm testing out v7.4.2 of Condor and have run into a job
submission problem. Firstly, I should say that the pool is an
upgraded 7.2.5 pool, with the same condor_config file but with
HOSTALLOW/HOSTDENY entries changed to ALLOW/DENY, as recommended
in the release notes for 7.4.0. Submitting a simple test job
fails to run, even though "condor_q -better" shows that there are
available resources. A look at the NegotiatorLog has the relevant
snippet:
04/13 16:33:49 Negotiating with xxxx@xxxxxxxxxxxxx at
<172.24.116.7:9682>
04/13 16:33:49 0 seconds so far
04/13 16:33:49 condor_read() failed: recv() returned -1, errno =
104 Connection reset by peer, reading 5 bytes from schedd
xxxx@xxxxxxxxxxxxxx
04/13 16:33:49 IO: Failed to read packet header
04/13 16:33:49 Failed to get reply from schedd
A look at the corresponding entry in the submitter's SchedLog has:
04/13 16:33:36 (pid:19045) Sent ad to central manager for
xxxx@xxxxxxxxxxxxx
04/13 16:33:36 (pid:19045) Sent ad to 1 collectors for
xxxx@xxxxxxxxxxxxx
04/13 16:33:49 (pid:19045) Unknown negotiator (172.24.116.1).
Aborting negotiation.
As can be surmised from the above, the submit host has IP address
172.24.116.7 and the central manager has 172.24.116.1. It looks
like the Schedd doesn't trust the Negotiator, right? By
comparison, when I submit the same job from a machine still
running 7.2.5 to the same central manager, then the job runs just
fine. That is:
submit host (7.4.2) -> central manager (7.4.2): Fails
submit host (7.2.5) -> central manager (7.4.2): Succeeds
Is there some new/extra configuration that needs to be carried
out on a submit host running 7.4 compared to that on 7.2?
Cheers,
Mark
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to
condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/