I find that it is possible to use one command to hold/release all jobs in dag job, just adding a ClusterId statement in constraint argument:
condor_hold -constraint "ClusterId==178||DAGManJobId==178"
178 is the ClusterId of the dagman job. This works for dag job hold/release/rm.
But there is a side effect for condor_rm: you will read two job abort events for each submitted job in the log file. This make my program into an infinity loop. I think one event is from my command, the other one is from dagman job.
Thanks Kent.
On Wed, 21 Aug 2013, 钱晓明 wrote:
I know all dag jobs can be reomved when I condor_rm dagman job, but
hold/release is not the case.
How can I make all jobs held/released according to dagman job status? I
think I should add something in my job submit file.
It's not too hard (assuming you don't have nested DAGs). You do two condor_hold commands -- one to hold the DAGMan job itself, and one to hold the node jobs.
Here's an example:
manta(222)% condor_q
-- Submitter: wenger@xxxxxxxxxxxxxxxxx : <128.105.14.228:51653> : manta.cs.wisc.edu
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
318.0 wenger 8/21 11:42 0+00:00:29 R 0 1.7 condor_dagman
320.0 wenger 8/21 11:43 0+00:00:03 R 10 0.0 job_dagman_node_pr
2 jobs; 0 completed, 0 removed, 0 idle, 2 running, 0 held, 0 suspended
manta(223)% condor_hold 318
All jobs in cluster 318 have been held
manta(224)% condor_hold -constraint "DAGManJobId==318"
All jobs matching constraint (DAGManJobId==318) have been held
manta(225)% condor_q
-- Submitter: wenger@xxxxxxxxxxxxxxxxx : <128.105.14.228:51653> : manta.cs.wisc.edu
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
318.0 wenger 8/21 11:42 0+00:00:42 H 0 1.7 condor_dagman
320.0 wenger 8/21 11:43 0+00:00:30 H 10 0.0 job_dagman_node_pr
2 jobs; 0 completed, 0 removed, 0 idle, 0 running, 2 held, 0 suspended
manta(226)%
If you have sub-DAGs, you'll have to do the condor_hold with the constraint for each sub-DAG.
I'm thinking that we should create a command that does this automatically, including handling sub-DAGs...
Kent Wenger
CHTC Team
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/