--- --- --- --- --- --- --- --- --- --- --- --- ---
Miha Ahronovitz
Principal | Ahrono Associates
Blog: http://my-inner-voice.blogspot.com/
c: 408 422 2757
tw: @myinnervoice
--- --- --- --- --- --- --- --- --- --- --- --- ---
On Sat, 28 Jun 2014, Gabriel Mateescu wrote:You're not the first person to ask for that capability:
If there is something that may need improvement in DAGMan,
it is that I do not understand why, in case of failure, one has
to restart the workflow rather than retry the failed jobs, possibly
on different execution nodes.
https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=2831,4
https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=3403,4
I don't know exactly how #2831 will happen, but hopefully in the 8.3 series...
One question for #2831 is this: how does the user notify DAGMan that a particular failed node should be retried? (This is assuming that the user has done some kind of manual fix to whatever caused the node to fail. If you just want to retry nodes without any kind of manual intervention, you can just specify retries in the DAG, although getting the retry to land on a different machine that the previous try is tricky.)
Kent Wenger
CHTC Team
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@cs.wisc.edu with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/