Hi, I am trying to submit DAGMan job in linux.
I have sixteen batches of job. Each job inturn has 41 jobs. And my requirement is batch2 jobs shouldn’t start until all batch1 jobs are done, similarly batch3 jobs shouldn’t start until all batch2 job are done. I created dagman job like the one below, the problem is dagman job fails randomly on the batch3 or batch4 etc and the reason is some of the batch3 job needs input which will be output from some of the batch2 job. And condor complains about the file is not found Read so far: Submitting job(s).............................ERROR: Can't open "/u/Senthil/DAGMan/MatlabJobs/immuneic4401.txt" with flags 00 (No such file or directory) Based on the time stamp this file was not created during the above error msg, it was created after that. How this is happening? Does condor dagman won’t wait until all the jobs for the parent is done before start child job, or just wait the last job of the parent to complete in order to start the child jobs. Is it possible to do what I am trying to do with condor dagman. Could you please let me know. Thanks, Senthil JOB
A Job_batch_1 JOB
B Job_batch_2 JOB
C Job_batch_3 JOB
D Job_batch_4 JOB
JOB
F Job_batch_6 JOB
G Job_batch_7 JOB
H Job_batch_8 JOB
I Job_batch_9 JOB
J Job_batch_10 JOB
K Job_batch_11 JOB
L Job_batch_12 JOB
M Job_batch_13 JOB
JOB
O Job_batch_15 JOB
P Job_batch_16 PARENT A CHILD B PARENT B CHILD C PARENT C CHILD D PARENT D CHILD E PARENT E CHILD F PARENT F CHILD G PARENT G CHILD H PARENT H CHILD I PARENT I CHILD J PARENT J CHILD K PARENT K CHILD L PARENT L CHILD M PARENT M CHILD N PARENT N CHILD O PARENT O CHILD P Retry A 10 Retry B 10 Retry C 10 Retry D 10 Retry E 10 Retry F 10 Retry G 10 Retry H 10 Retry I 10 Retry J 10 Retry K 10 Retry L 10 Retry M 10 Retry N 10 Retry O 10 Retry P 10 |