Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] properly removing/stopping a dag and all its nodes?

Date: Tue, 5 Jul 2011 13:00:02 -0500 (CDT)
From: "R. Kent Wenger" <wenger@xxxxxxxxxxx>
Subject: Re: [Condor-users] properly removing/stopping a dag and all its nodes?

On Tue, 5 Jul 2011, Rowe, Thomas wrote:

I had assumed that issuing "condor_rm 267", where 267 is the cluster

of a condor_dagman.exe job, would cleanly terminate all outstanding
nodes of the DAG. Instead there a bunch of

jobs left according to condor_q and I have to use -forcex to remove

them. Also, condor_status indicates many "State: Claimed; Activity:
Idle" slots. I have to "condor_restart -all" to clean

them up.


OK, setting "UWCS_CLAIM_WORKLIFE = 0" makes the cancelled nodes abandon
slots right away. But I get loads of nodes stuck in the 'X' state and
the corresponding condor_shadow processes never exit. I have to manually
kill the condor_shadow processes.

What am I doing wrong?

What happens if you manually run condor_rm on one of the node jobs asopposed to the DAGMan job itself? (That's basically the same thing thatDAGMan does.) My guess at this point is that the problems have somethingto do with the jobs themselves, or the configuration of your pool, ratherthan the fact that they're managed by DAGMan.

Does your dagman.out file show any problems when DAGMan tried to removethe node jobs? (Look for the string "Error removing DAGMan jobs".)


Kent Wenger
Condor Team

Follow-Ups:
- Re: [Condor-users] properly removing/stopping a dag and all its nodes?
  - From: Rowe, Thomas

References:
- Re: [Condor-users] properly removing/stopping a dag and all its nodes?
  - From: Rowe, Thomas

Prev by Date: Re: [Condor-users] properly removing/stopping a dag and all its nodes?
Next by Date: Re: [Condor-users] properly removing/stopping a dag and all its nodes?
Previous by thread: Re: [Condor-users] properly removing/stopping a dag and all its nodes?
Next by thread: Re: [Condor-users] properly removing/stopping a dag and all its nodes?
Index(es):
- Date
- Thread

Mailing List Archives

Authenticated access

Re: [Condor-users] properly removing/stopping a dag and all its nodes?