I had assumed that issuing "condor_rm 267", where 267 is the cluster
of a condor_dagman.exe job, would cleanly terminate all outstanding
nodes of the DAG. Instead there a bunch of
jobs left according to condor_q and I have to use -forcex to remove
them. Also, condor_status indicates many "State: Claimed; Activity:
Idle" slots. I have to "condor_restart -all" to clean
them up.
OK, setting "UWCS_CLAIM_WORKLIFE = 0" makes the cancelled nodes abandon
slots right away. But I get loads of nodes stuck in the 'X' state and
the corresponding condor_shadow processes never exit. I have to manually
kill the condor_shadow processes.
What am I doing wrong?