Hi. I currently have the Condor home directory shared by NFS to all
members of our cluster. This is great for centralized configuration.
However, it seems that even a momentary NFS outage (<1-2min) is enough
to kill all jobs. They do restart when NFS comes back.
We use NFS over UDP so that clients are able to withstand server reboots
with mount options "hard" and "intr" to be sure that jobs simply hang
until the server comes back. Rather than waiting, Condor kills the
jobs. Is there a configurable timeout I should have set. How can I
otherwise make Condor resilient to such NFS outages?