Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] proposed change in DAGMan
On Wed, Jun 15, 2016 at 2:08 PM, R. Kent Wenger <wenger@xxxxxxxxxxx> wrote:
> The proposed change is that, if DAGMan is "stuck" because all queued node
> jobs are on hold (and there are no ready jobs, running PRE/POST scripts,
> etc.), DAGMan will consider this a failure and abort the DAG (which results
> in all queued node jobs being removed, and a rescue DAG being generated).
I'm curious as to the motivation for this. If I understand the
proposal correctly, this leaves workflows with a single node at some
level (e.g. diamond DAGs) vulnerable to instant-kaboom if there's a
problem. Sure, the user can just submit the rescue DAG, but that
doesn't help if the submission happens through some intermediary
(which is a common use case for some of our customers).
I think this functionality would be a good addition, but why opt-out
instead of opt-in?
Thanks,
BC
--
Ben Cotton
Cycle Computing
Better Answers. Faster.
http://www.cyclecomputing.com
twitter: @cyclecomputing