[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] windows 7 silently drops udp when no arp entry is present



Hello list,

We're having a problem with Windows 7 nodes. I believe this is it:

http://support.microsoft.com/kb/233401

It seems the condor_master on the client tries to send an update to the collector. This update consists of two fragments, but only the last fragment is actually sent. The first fragment is dropped because there isn't an arp entry for the collector yet. Subsequent updates are fully sent, until the arp cache entry for the collector times out. At least, that is what I think is going on. :)

I have left a ping -t running from a cmd window to prevent the arp cache entry from going stale, and all updates since have gone through completely. I can't raise the arp cache timeout, so I am thinking of lowering master_update_interval to act as a ping, but Windows 7 has a very short timeout between 15 and 45 seconds:

http://support.microsoft.com/kb/949589/en-us?fr=1

Less than 15 seconds seems a very short update interval. We have about 10,000 nodes in the pool. With TCP updates, we'd need a very large number of sockets and file descriptors on the collector. I don't even know if that would work. The manual used to recommend against it but seems less insistent now. Have TCP updates been improved upon lately?

Thanks!

Rob de Graaf
Erasmus MC