On Thu September 28 2006 2:38 pm, Steven Timm wrote:
> I put condor 6.8.1 on my first few test nodes and submitted the same
> test vanilla universe job that I always do for testing.
>
> [timm@fnpcg ~]$ condor_submit recon1_1.run
> Submitting job(s)
> WARNING: Log file /home/timm/recon1.log.47070.0 is on NFS.
> This could cause log file corruption and is _not_ recommended.
> .
> Logging submit event(s).
> 1 job(s) submitted to cluster 47070.
>
>
> The log file in question is indeed on nfs, but it has been on nfs
> throughout the whole life of my cluster and I don't see why we
> are just now getting warnings about this. There haven't been problems
> up until now.
This isn't a new problem, just a new warning about an old problem.
File locking on NFS is inherently unreliable. We've seen enough cases of NFS
based job logs getting corrupted (from multiple processes updating the log
file) that we decided to add the warning. I suspect that the risk of such
corruption is reduced if all writers are on the same machine, possibly even
eliminated, but I don't know for certain. In particular, corrupted job logs
tend to make DAGMan very unhappy.
Ultimately, we'd like to implement a more advanced locking mechanism (using a
separate lock file), but we haven't had time to add this yet.
-Nick