Hello list,
We're having a problem with Windows 7 nodes. I believe this is it:
http://support.microsoft.com/kb/233401
It seems the condor_master on the client tries to send an update to the
collector. This update consists of two fragments, but only the last
fragment is actually sent. The first fragment is dropped because there
isn't an arp entry for the collector yet. Subsequent updates are fully
sent, until the arp cache entry for the collector times out. At least,
that is what I think is going on. :)
I have left a ping -t running from a cmd window to prevent the arp cache
entry from going stale, and all updates since have gone through
completely. I can't raise the arp cache timeout, so I am thinking of
lowering master_update_interval to act as a ping, but Windows 7 has a
very short timeout between 15 and 45 seconds:
http://support.microsoft.com/kb/949589/en-us?fr=1
Less than 15 seconds seems a very short update interval. We have about
10,000 nodes in the pool. With TCP updates, we'd need a very large
number of sockets and file descriptors on the collector. I don't even
know if that would work. The manual used to recommend against it but
seems less insistent now. Have TCP updates been improved upon lately?
Thanks!
Rob de Graaf
Erasmus MC
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/