Re: [Condor-users] Dagman & Job Priorities


Date: Thu, 10 Feb 2005 09:25:01 -0800
From: "Michael S. Root" <mike@xxxxxxxxxxxxxx>
Subject: Re: [Condor-users] Dagman & Job Priorities
Peter F. Couvares wrote:
Michael S. Root wrote:

It is frequently the case for us that a single user is running multiple DAGman jobs. The behavior we get is that all jobs from a user's dags get run concurrently (within the user's resource limits), such that the dags all finish at roughly the same time. It is sometimes the case that a dag with just one job left will sit in the queue for hours waiting for other of the same user's dags (with more unfinished jobs) to 'catch up'.


Yes -- all other things (like requirements, rank, priority, etc.) being equal, the condor_schedd will run jobs in FIFO order. So when multiple DAGs are running in parallel, and releasing their respective jobs into the queue as they become ready, the "first" DAG's most recent job won't necessarily be first in the queue.

What we would like is to have all the jobs from the first dag submitted to finish first, then the second, etc...


Makes sense. If I understand correctly, they don't "have to" finish first in order to be correct -- but it's easier for you to follow and keep track of everything if they do.

Yes, precisely, although in some cases this problem can interfere with getting work done at the proper time. Mostly, though, it is just a frustration thing.


Since the dags are not necessarily related in terms of what they're processing and usually aren't submitted at the same time, it doesn't make sense to have one dag depend on another.


Strictly speaking, the different DAGs still do not "depend on" one another in the scenario you describe -- their jobs just aren't sorted in the queue in the order you'd like. If they do depend on each other, these dependencies should be represented in another DAG (i.e., a higher-level DAG containing your existing DAGs).

Right again.

I thought about setting the machine RANK expression to "( -1 * DAGManJobId )", thus the lowest numbered DAGman jobs would be preferred. I haven't tried it yet, though, because I'm not sure if this expression would apply before or after the machine has been matched to a user.


Interesting -- I wouldn't have thought of this, and although it should work, it will interfere with condor's attempts to manage user priorities. The machine RANK is evaluated by the negotiator before a match is made, in order to select the best match.

Would a job with low user-priority and a low-numbered DAGManJobID get priority over another job with a higher user-priority, but a higher-numbered DAGManJobID?


Yes, exactly -- because Condor respects a resource owner's wishes above all, including pool-wide user priority. If a resource owner says their machine prefers job X, Condor will not override that just to satisfy its attempts at fair-share. This is why I wouldn't use this approach, because it's not exactly what you want.

I thought as much.

Even better would be if there were a way to look at the job priority of DAGman itself and have sub-jobs get chosen based on that. I have noticed that changing a DAGman job's priority doesn't have any affect on it's children.


You're right, this would be a nice feature to have as an option -- and would solve your problem. I'll see if I can implement it sometime soon (i.e., before Condor 6.8.0). Keep an eye on the release notes...


I was thinking that the ideal solution would be to have DAGman manage it's children's job priorities based on its own. Newly submitted jobs should inherit the parent DAGman's priority, and if the DAGman's job priority is changed while running, it should modify all of its children.

While I'm thinking about DAGman, is a depth-first/breadth-first option slated for 6.8.0? I think this has been mentioned before, but I don't remember what the outcome was.

Thanks for your help.

-Mike


[← Prev in Thread] Current Thread [Next in Thread→]