On Tue, 13 Jun 2006, lohit wrote:
> - Would it be possible for you to move the node job log files themselves
> off of NFS? That's probably the first thing to try.
> I tried this. I generated the sub file using condor_submit_dag -no_submit
> command. and then edited the file to point all log, lock files to local disk
> on node. Yet I am seeing this error. Here is my edited run.dag.condor.subfile
>
> # Filename: run.dag.condor.sub
> # Generated by condor_submit_dag run.dag
> universe = vanilla
> executable = /home/usr1/condor/bin/condor_dagman
> getenv = True
> output = run.dag.lib.out
> error = run.dag.lib.out
> log = run.dag.dagman.log
> remove_kill_sig = SIGUSR1
> arguments = -f -l . -Debug 3 -Lockfile /scratch/usr1/run.dag.lock -Dag
> run.dag -Rescue /scratch/usr1/run.dag.rescue -Condorlog
> /scratch/usr1/run.dag.dummy_log
> environment = _CONDOR_DAGMAN_LOG=run.dag.dagman.out
> ;_CONDOR_MAX_DAGMAN_LOG=0
> queue
Actually, to move all of the log files off of NFS, you need to edit
the submit files for each individual node, not just the submit file
for DAGMan itself.
Given the error message you got, I think that the lock is failing when
DAGMan is trying to read a node job user log, so changing DAGMan's own
log file doesn't help.
Kent Wenger
Condor Team
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at either
https://lists.cs.wisc.edu/archive/condor-users/
http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR