Thank you for your comments. It looks like I am going to have to spend a
lot more time investigating this because it is not evident what has
happened. Most of the jobs did complete, but something happened to the
communication between the jobs and the condor_dagman.exe. I do not know
the communication process yet, but I did not see any errors in the dagman
log or anything. Basically the dagman went into recovery mode and could
never exit this recovery loop. When it went into recovery mode it
generated this file: dprintf_failure.DAGMAN. If I delete the file it would
generate it again on the next recovery attempt.
When I released the condor_dagman job, a recovery file was not generated.
I then tried to rerun the dag and the following happened:
dprintf_failure.DAGMAN was generated again
condor_dagman job went into idle
no dag jobs were submitted
condor_dagman.exe would not exit without forcing it