Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] New NFS warning with condor 6.8.1
- Date: Thu, 28 Sep 2006 14:54:39 -0500
- From: Nick LeRoy <nleroy@xxxxxxxxxxx>
- Subject: Re: [Condor-users] New NFS warning with condor 6.8.1
On Thu September 28 2006 2:38 pm, Steven Timm wrote:
> I put condor 6.8.1 on my first few test nodes and submitted the same
> test vanilla universe job that I always do for testing.
>
> [timm@fnpcg ~]$ condor_submit recon1_1.run
> Submitting job(s)
> WARNING: Log file /home/timm/recon1.log.47070.0 is on NFS.
> This could cause log file corruption and is _not_ recommended.
> .
> Logging submit event(s).
> 1 job(s) submitted to cluster 47070.
>
>
> The log file in question is indeed on nfs, but it has been on nfs
> throughout the whole life of my cluster and I don't see why we
> are just now getting warnings about this. There haven't been problems
> up until now.
This isn't a new problem, just a new warning about an old problem.
File locking on NFS is inherently unreliable. We've seen enough cases of NFS
based job logs getting corrupted (from multiple processes updating the log
file) that we decided to add the warning. I suspect that the risk of such
corruption is reduced if all writers are on the same machine, possibly even
eliminated, but I don't know for certain. In particular, corrupted job logs
tend to make DAGMan very unhappy.
Ultimately, we'd like to implement a more advanced locking mechanism (using a
separate lock file), but we haven't had time to add this yet.
-Nick
--
<<< There is no spoon. >>>
/`-_ Nicholas R. LeRoy The Condor Project
{ }/ http://www.cs.wisc.edu/~nleroy http://www.cs.wisc.edu/condor
\ / nleroy@xxxxxxxxxxx The University of Wisconsin
|_*_| 608-265-5761 Department of Computer Sciences