|
Hi Stefano,
Running in recovery would do that. Also, depending on the debug level of DAGMan, set via DAGMAN_VERBOSITY, the configuration options may not be printed to the debug log. The needed value to see all that is DAGMAN_VERBOSITY >= 2. Also, I have been wanting to
make DAGMan's debug levels similar to actual HTCondor (i.e. Have D_CONFIG, D_PLACEMENT, D_SCRIPT, etc). Perhaps this would be a good excuse to implement this.
-Cole Bollig
From: Stefano Belforte <stefano.belforte@xxxxxxx>
Sent: Wednesday, October 22, 2025 11:09 AM To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx> Cc: stefano.belforte@xxxxxxx <stefano.belforte@xxxxxxx>; Cole Bollig <cabollig@xxxxxxxx> Subject: Re: [HTCondor-users] Queries regarding reset retries in rescue dag thanks a lot Cole.
Yeah. I work with Vijay on this, as you may have suspected. We still haven't been able to get firm evidence that dagman config file was read, but after we removed `-DoRecover` from `condor_dagman` arguments the retry count appears to be reset and Dagman does what we are expecting it to do. Looks like at some point in the far past CRAB developers decided to switch Dagman from Rescue to Recovery mode https://urldefense.com/v3/__https://github.com/dmwm/CRABServer/commit/c812d1c1a7c5fc1e5d7a5ef9f27c247fde2c7a4f*diff-cc7fafd6621a3816cc74145abaa7220e550bf8933933ab306af23467af7119c4__;Iw!!Mak6IKo!KzuYk3y0z0fj7bPw9iMx3-SrMJWxbosr4aBi0ajs9Eaj1fOlh3si05YVbj7A8tEy2LTbepmw5uLARhYwZnEUigKJtWc8$ We are now trying to switch to Rescue mode instead, since as discussed we want to remove the code which hacks Dagman logs and status files. I think we need to go a bit more along this way before we understand how to use it. Then we can maybe have a discussion about whether our strategy makes sense for our goal. IIUC Dagman will still use recovery mode in case of incidents like schedd restarts, machine reboots etc. That's fine. Stefano |