|
Dear HTCondor experts,
We have been trying to make use of the
configuration
variable DAGMAN_RESET_RETRIES_UPON_RESCUE by
setting it to True in a config file that we refer to with CONFIG command in the dag file. However, it doesn't seem to make any difference to the Retries. We have a simple RETRY Job <no. of retries> command that we want to reset when the number of retries
are done and failure persists leading to writing of rescue file along with some PRE and POST scripts. We also don't find the variable set in the output dagman.out.
In the backdrop of this problem, I have the following questions:
A.) Is there a conclusive way to make sure our config file is being read and the config variable being set to True besides checking for the variable in the output file? Or an alternative way to set this variable?
B.) The manual mentions the line "If
the Rescue DAG file is generated before all retries of a node are completed, then the Rescue DAG file will also contain
RETRY entries."
If we could understand how to set the Configuration variables, it would be very helpful.
Any insight will be highly appreciated.
Thanks!
Cheers,
Vijay Chakravarty
|