On Monday, 15 August, 2011 at 10:59 AM, Dan Bradley
wrote:
Ian,
I believe the settings you mentioned will achieve what you
are trying to do. In 7.6, it should also be sufficient to
do this:
WANT_UDP_COMMAND_SOCKET = false
UPDATE_COLLECTOR_WITH_TCP = True
COLLECTOR_MAX_FILE_DESCRIPTORS = 3000
In 7.6, daemons that do not have a UDP port advertise this
fact in their address information. Therefore, it is not
necessary to fiddle with protocol knobs such as
SCHEDD_SEND_VACATE_VIA_TCP, because the client
automatically switches to TCP when it sees that the server
lacks a UDP port.
Thanks Dan! That's sufficient incentive for me to ensure
everything is 7.6.x in my pool then. Right now I'm running a
7.4.3 scheduler and CM, but 7.6.1 on the execute node.
For what it's worth, this is all to debug an issue I'm seeing
on a large CPU count Windows 2k8 Server machine. It has 40
physical cores but Condor only seems to be able to utilize 12
slots on the box before it starts to fail to accept claims from
the shadows with:
08/10/11 18:29:50 Received TCP command 444 (ACTIVATE_CLAIM)
from unauthenticated@unmapped <10.78.194.211:40724>,
access level DAEMON
08/10/11 18:29:50 Calling HandleReq
<command_activate_claim> (0)
08/10/11 18:29:50 slot25: Got activate_claim request from
shadow (<10.78.194.211:40724>)
08/10/11 18:30:06 condor_write(): Socket closed when trying
to write 13 bytes to <10.78.194.211:40724>, fd is 1356
08/10/11 18:30:06 Buf::write(): condor_write() failed
08/10/11 18:30:06 slot25: Can't send eom to shadow.
08/10/11 18:30:06 Return from HandleReq
<command_activate_claim> (handler: 15.615s, sec: 0.016s)