[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Condor Error No locks available
- Date: Tue, 13 Jun 2006 11:35:37 -0400
- From: lohit <lohit.vijayarenu@xxxxxxxxx>
- Subject: Re: [Condor-users] Condor Error No locks available
Hello Kent,
Thanks for the reply.
- What architecture/OS are you running on?
I am using Ubuntu LINUX (Linux cpu02 2.6.12-10-amd64-xeon #1 SMP Thu Dec 22 11:43:32 UTC 2005 x86_64 GNU/Linux)
- Is this problem repeatable or not?
Yes it is repeated and am not sure how to proceed
- Would it be possible for you to move the node job log files themselves
off of NFS? That's probably the first thing to try.
I tried this. I generated the sub file using condor_submit_dag -no_submit command. and then edited the file to point all log, lock files to local disk on node. Yet I am seeing this error. Here is my edited
run.dag.condor.sub file
# Filename: run.dag.condor.sub
# Generated by condor_submit_dag run.dag
universe = vanilla
executable = /home/usr1/condor/bin/condor_dagman
getenv = True
output = run.dag.lib.out
error = run.dag.lib.out
log = run.dag.dagman.log
remove_kill_sig = SIGUSR1
arguments = -f -l . -Debug 3 -Lockfile /scratch/usr1/run.dag.lock -Dag run.dag
-Rescue /scratch/usr1/run.dag.rescue -Condorlog /scratch/usr1/run.dag.dummy_log
environment = _CONDOR_DAGMAN_LOG=run.dag.dagman.out;_CONDOR_MAX_DAGMAN_LOG=0
queue
-lohit
On 6/13/06, R. Kent Wenger <wenger@xxxxxxxxxxx> wrote:
On Mon, 12 Jun 2006, lohit wrote:
> I am trying to submit jobs using DAG file. To test the feature, I have 4
> jobs. 3 defined as PARENT and fourth as CHILD.
>
> Job sh_loop1 sh_loop1.cmd
> Job sh_loop2 sh_loop2.cmd
> Job sh_loop3 sh_loop3.cmd
> Job sh_loop4 sh_loop4.cmd
> PARENT sh_loop1 sh_loop2 sh_loop3 CHILD sh_loop4
>
> If I submit this .dag file using condor_submit_dag, I am seeing this error
>
> 6/12 23:32:18 assigned Condor ID (22.0.0)
> 6/12 23:32:18 Just submitted 3 jobs this cycle...
> 6/12 23:32:18 FileLock::obtain(1) failed - errno 37 (No locks available)
> 6/12 23:32:18 ERROR "Assertion ERROR on (is_locked)" at line 916 in file
> user_log.C
>
> I searched previous thread with problem of locks as part of NFS, so I now
> have defined ${LOCK) to be local directory on the nodes.
> But, still I am seeing this error and the job assigned to CHILD is not being
> submitted.
>
> Am, I missing something? Please could anyone explain what the problem is and
> how I could solve this
A few questions:
- What architecture/OS are you running on?
- Is this problem repeatable or not?
- Would it be possible for you to move the node job log files themselves
off of NFS? That's probably the first thing to try.
Kent Wenger
Condor Team
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at either
https://lists.cs.wisc.edu/archive/condor-users/
http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR