Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Dagman & Job Priorities
- Date: Tue, 8 Feb 2005 13:59:39 -0600
- From: "Peter F. Couvares" <pfc@xxxxxxxxxxx>
- Subject: Re: [Condor-users] Dagman & Job Priorities
Michael S. Root wrote:
It is frequently the case for us that a single user is running
multiple DAGman jobs. The behavior we get is that all jobs from a
user's dags get run concurrently (within the user's resource limits),
such that the dags all finish at roughly the same time. It is
sometimes the case that a dag with just one job left will sit in the
queue for hours waiting for other of the same user's dags (with more
unfinished jobs) to 'catch up'.
Yes -- all other things (like requirements, rank, priority, etc.) being
equal, the condor_schedd will run jobs in FIFO order. So when multiple
DAGs are running in parallel, and releasing their respective jobs into
the queue as they become ready, the "first" DAG's most recent job won't
necessarily be first in the queue.
What we would like is to have all the jobs from the first dag
submitted to finish first, then the second, etc...
Makes sense. If I understand correctly, they don't "have to" finish
first in order to be correct -- but it's easier for you to follow and
keep track of everything if they do.
Since the dags are not necessarily related in terms of what they're
processing and usually aren't submitted at the same time, it doesn't
make sense to have one dag depend on another.
Strictly speaking, the different DAGs still do not "depend on" one
another in the scenario you describe -- their jobs just aren't sorted
in the queue in the order you'd like. If they do depend on each other,
these dependencies should be represented in another DAG (i.e., a
higher-level DAG containing your existing DAGs).
I thought about setting the machine RANK expression to "( -1 *
DAGManJobId )", thus the lowest numbered DAGman jobs would be
preferred. I haven't tried it yet, though, because I'm not sure if
this expression would apply before or after the machine has been
matched to a user.
Interesting -- I wouldn't have thought of this, and although it should
work, it will interfere with condor's attempts to manage user
priorities. The machine RANK is evaluated by the negotiator before a
match is made, in order to select the best match.
Would a job with low user-priority and a low-numbered DAGManJobID get
priority over another job with a higher user-priority, but a
higher-numbered DAGManJobID?
Yes, exactly -- because Condor respects a resource owner's wishes above
all, including pool-wide user priority. If a resource owner says their
machine prefers job X, Condor will not override that just to satisfy
its attempts at fair-share. This is why I wouldn't use this approach,
because it's not exactly what you want.
Even better would be if there were a way to look at the job priority
of DAGman itself and have sub-jobs get chosen based on that. I have
noticed that changing a DAGman job's priority doesn't have any affect
on it's children.
You're right, this would be a nice feature to have as an option -- and
would solve your problem. I'll see if I can implement it sometime soon
(i.e., before Condor 6.8.0). Keep an eye on the release notes...
It wouldn't be hard to write a script to change the priority of all a
DAG's children, but it would have to be run repeatedly each time
DAGman submits more jobs into the queue (we often run with a -maxjobs
limit).
Right -- this is ugly and kludgey and wrong, but could serve as a
temporary solution, if it's important enough to you.
-Peter
--
Peter Couvares University of Wisconsin-Madison
Condor Project Research Department of Computer Sciences
pfc@xxxxxxxxxxx 1210 W. Dayton St. Rm #4241
(608) 265-8936 Madison, WI 53706-1685