Peter F. Couvares wrote:
Michael S. Root wrote:
It is frequently the case for us that a single user is running
multiple DAGman jobs. The behavior we get is that all jobs from a
user's dags get run concurrently (within the user's resource limits),
such that the dags all finish at roughly the same time. It is
sometimes the case that a dag with just one job left will sit in the
queue for hours waiting for other of the same user's dags (with more
unfinished jobs) to 'catch up'.
Yes -- all other things (like requirements, rank, priority, etc.) being
equal, the condor_schedd will run jobs in FIFO order. So when multiple
DAGs are running in parallel, and releasing their respective jobs into
the queue as they become ready, the "first" DAG's most recent job won't
necessarily be first in the queue.
What we would like is to have all the jobs from the first dag
submitted to finish first, then the second, etc...
Makes sense. If I understand correctly, they don't "have to" finish
first in order to be correct -- but it's easier for you to follow and
keep track of everything if they do.
Yes, precisely, although in some cases this problem can interfere with
getting work done at the proper time. Mostly, though, it is just a
frustration thing.
Since the dags are not necessarily related in terms of what they're
processing and usually aren't submitted at the same time, it doesn't
make sense to have one dag depend on another.
Strictly speaking, the different DAGs still do not "depend on" one
another in the scenario you describe -- their jobs just aren't sorted in
the queue in the order you'd like. If they do depend on each other,
these dependencies should be represented in another DAG (i.e., a
higher-level DAG containing your existing DAGs).
Right again.
I thought about setting the machine RANK expression to "( -1 *
DAGManJobId )", thus the lowest numbered DAGman jobs would be
preferred. I haven't tried it yet, though, because I'm not sure if
this expression would apply before or after the machine has been
matched to a user.
Interesting -- I wouldn't have thought of this, and although it should
work, it will interfere with condor's attempts to manage user
priorities. The machine RANK is evaluated by the negotiator before a
match is made, in order to select the best match.
Would a job with low user-priority and a low-numbered DAGManJobID get
priority over another job with a higher user-priority, but a
higher-numbered DAGManJobID?
Yes, exactly -- because Condor respects a resource owner's wishes above
all, including pool-wide user priority. If a resource owner says their
machine prefers job X, Condor will not override that just to satisfy its
attempts at fair-share. This is why I wouldn't use this approach,
because it's not exactly what you want.
I thought as much.
Even better would be if there were a way to look at the job priority
of DAGman itself and have sub-jobs get chosen based on that. I have
noticed that changing a DAGman job's priority doesn't have any affect
on it's children.
You're right, this would be a nice feature to have as an option -- and
would solve your problem. I'll see if I can implement it sometime soon
(i.e., before Condor 6.8.0). Keep an eye on the release notes...
I was thinking that the ideal solution would be to have DAGman manage
it's children's job priorities based on its own. Newly submitted jobs
should inherit the parent DAGman's priority, and if the DAGman's job
priority is changed while running, it should modify all of its children.
While I'm thinking about DAGman, is a depth-first/breadth-first option
slated for 6.8.0? I think this has been mentioned before, but I don't
remember what the outcome was.
Thanks for your help.
-Mike
|