Re: [Condor-users] Dagman & Job Priorities


Date: Tue, 8 Feb 2005 13:59:39 -0600
From: "Peter F. Couvares" <pfc@xxxxxxxxxxx>
Subject: Re: [Condor-users] Dagman & Job Priorities
Michael S. Root wrote:
It is frequently the case for us that a single user is running multiple DAGman jobs. The behavior we get is that all jobs from a user's dags get run concurrently (within the user's resource limits), such that the dags all finish at roughly the same time. It is sometimes the case that a dag with just one job left will sit in the queue for hours waiting for other of the same user's dags (with more unfinished jobs) to 'catch up'.

Yes -- all other things (like requirements, rank, priority, etc.) being equal, the condor_schedd will run jobs in FIFO order. So when multiple DAGs are running in parallel, and releasing their respective jobs into the queue as they become ready, the "first" DAG's most recent job won't necessarily be first in the queue.


What we would like is to have all the jobs from the first dag submitted to finish first, then the second, etc...

Makes sense. If I understand correctly, they don't "have to" finish first in order to be correct -- but it's easier for you to follow and keep track of everything if they do.


Since the dags are not necessarily related in terms of what they're processing and usually aren't submitted at the same time, it doesn't make sense to have one dag depend on another.

Strictly speaking, the different DAGs still do not "depend on" one another in the scenario you describe -- their jobs just aren't sorted in the queue in the order you'd like. If they do depend on each other, these dependencies should be represented in another DAG (i.e., a higher-level DAG containing your existing DAGs).


I thought about setting the machine RANK expression to "( -1 * DAGManJobId )", thus the lowest numbered DAGman jobs would be preferred. I haven't tried it yet, though, because I'm not sure if this expression would apply before or after the machine has been matched to a user.

Interesting -- I wouldn't have thought of this, and although it should work, it will interfere with condor's attempts to manage user priorities. The machine RANK is evaluated by the negotiator before a match is made, in order to select the best match.


Would a job with low user-priority and a low-numbered DAGManJobID get priority over another job with a higher user-priority, but a higher-numbered DAGManJobID?

Yes, exactly -- because Condor respects a resource owner's wishes above all, including pool-wide user priority. If a resource owner says their machine prefers job X, Condor will not override that just to satisfy its attempts at fair-share. This is why I wouldn't use this approach, because it's not exactly what you want.


Even better would be if there were a way to look at the job priority of DAGman itself and have sub-jobs get chosen based on that. I have noticed that changing a DAGman job's priority doesn't have any affect on it's children.

You're right, this would be a nice feature to have as an option -- and would solve your problem. I'll see if I can implement it sometime soon (i.e., before Condor 6.8.0). Keep an eye on the release notes...


It wouldn't be hard to write a script to change the priority of all a DAG's children, but it would have to be run repeatedly each time DAGman submits more jobs into the queue (we often run with a -maxjobs limit).

Right -- this is ugly and kludgey and wrong, but could serve as a temporary solution, if it's important enough to you.


-Peter

--
Peter Couvares                        University of Wisconsin-Madison
Condor Project Research               Department of Computer Sciences
pfc@xxxxxxxxxxx                       1210 W. Dayton St. Rm #4241
(608) 265-8936                        Madison, WI 53706-1685


[← Prev in Thread] Current Thread [Next in Thread→]