Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[HTCondor-users] proposed change in DAGMan
- Date: Wed, 15 Jun 2016 13:08:33 -0500 (CDT)
- From: "R. Kent Wenger" <wenger@xxxxxxxxxxx>
- Subject: [HTCondor-users] proposed change in DAGMan
We are proposing a change in DAGMan behavior relative to node jobs that
are on hold, and before implementing it, we wanted to get feedback from
the HTCondor user community.
Right now, DAGMan will wait indefinitely for jobs that are on hold, even
if *all* of the node jobs for the DAG are on hold and, therefore, no
progress is being made.
The proposed change is that, if DAGMan is "stuck" because all queued node
jobs are on hold (and there are no ready jobs, running PRE/POST scripts,
etc.), DAGMan will consider this a failure and abort the DAG (which
results in all queued node jobs being removed, and a rescue DAG being
generated).
Users would be able to opt out of the new behavior via a configuration
setting.
Please let us know what you think of this proposal...
Kent
--
R. Kent Wenger (wenger@xxxxxxxxxxx, 608-262-6627,
http://www.cs.wisc.edu/~wenger/)
Computer Sciences Department
University of Wisconsin-Madison