Rob,
After monitoring condor udp traffic for a while, I've found an
interesting problem.. sometimes clients will start misbehaving, and
send only part of the data needed for update_master_ad /
update_startd_ad commands.
Has anyone seen this? Any ideas on what's causing it?
Yes, and my first thought is, is this Windows?
<snip>
Both clients are windows XP, condor version 6.8.6. I've noticed this
behavior on both, sometimes one, sometimes the other. The problem is
gone after a condor_restart, but will eventually re-occur. The
client logfiles don't show anything interesting.
Any ideas on how to debug / fix this would be welcome.
This is a problem with UDP under Windows, it considers a packet "sent"
when the sendto() call is made, not when the packet has actually hit
the wire. So if sendto() is called too rapidly (e.g. when collector
update packets are split) you can lose the previous UDP packet if it
hasn't really been sent yet.
What we did was add "D_NETWORK" to the MASTER_DEBUG and STARTD_DEBUG
flags in the config file. The added delay of logging the UDP packets
seems to be enough to keep this from happening.
You can alternatively use "UPDATE_COLLECTOR_WITH_TCP = True" and avoid
UDP entirely.