_______________________________________________
- When HTCondor is adding all of individual node jobs to the HTCondor DAGMan .dag file, it will have a little hiccup which prevents to properly record that one or two of the node job is successfully submitted.
- All the individual node jobs finish, but because of the hiccup, DAGMan thinks some are not finished
- As a result, DAGMan doesnât submit the next group of node jobs for the our processing application, our customers are wondering what to do?
When this happens, weâre manually copying the .dag file, and making a âdag.rescueâ.
Weâd manually edit dag.rescue to tell which node jobs are done, and then condor_submit_dag the dag.rescue until processing can finish.
My question is, do you know a way to make HTCondor generate itâs own dag.rescue file? Or a setting / workaround to avoid this behavior.
Weâve helped our customerâs with a case like this, the .dag file was HUGE lots of copy paste for the â DONEâ phrase to mark individual jobs status.
Just wondering if you know a way to do it automatically.
Kind Regards,
Fernando M. Schapira
Senior Support Engineer
From: SCHAPIRA Fernando
Sent: Sunday, January 27, 2019 20:19
To: 'Greg Thain' <gthain@xxxxxxxxxxx>; 'John M Knoeller' <johnkn@xxxxxxxxxxx>; 'Todd Tannenbaum' <tannenba@xxxxxxxxxxx>
Subject: HTCondor - Force a dag.rescue file? or other workaround
Hi Greg, Hi JK,
- When HTCondor is adding all of individual node jobs to the HTCondor DAGMan .dag file, it will have a little hiccup which prevents to properly record that one or two of the node job is successfully submitted.
- All the individual node jobs finish, but because of the hiccup, DAGMan thinks some are not finished
- As a result, DAGMan doesnât submit the next group of node jobs for the our processing application, our customers are wondering what to do?
When this happens, weâre manually copying the .dag file, and making a âdag.rescueâ.
Weâd manually edit dag.rescue to tell which node jobs are done, and then condor_submit_dag the dag.rescue until processing can finish.
My question is, do you know a way to make HTCondor generate itâs own dag.rescue file? Or a setting / workaround to avoid this behavior.
Weâve helped our customerâs with a case like this, the .dag file was HUGE lots of copy paste for the â DONEâ phrase to mark individual jobs status.
Just wondering if you know a way to do it automatically.
Kind Regards,
Fernando M. Schapira
Senior Support Engineer
Pre-Sales and Commissioning Project ManagerGeospatial Content Solutions - GCS
*****************************************
Leica Geosystems AG
Heinrich-Wild-Strasse, 9435 Heerbrugg - Switzerland
Phone: +41 71 727 43 11, Fax: +41 71 727 43 01
e-mail: fernando.schapira@xxxxxxxxxxxxxxxxxxxx
*********www.leica-geosystems.com*********
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/