Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Jobs License Management
- Date: Tue, 14 Oct 2008 21:58:01 -0500
- From: Matthew Farrellee <matt@xxxxxxxxxx>
- Subject: Re: [Condor-users] Jobs License Management
Ian Chesal wrote:
So many questions. I've been waiting for this for...a while! :)
I figured as much. There is a startling number of possibilities for this
feature. I almost sent you a preview version to play with.
You can use Concurrency Limits from 7.1.3 [1].
Any idea when we'll see fuller documentation for this feature? I'm most
interested in what happens if a job asks for two limited resources and
another job only asks for one. How are race conditions handled to
minimize blocking? Is there a paper maybe?
The documentation should be in the released manual very shortly. There
may be a paper of some form about the feature in the (not near) future.
Limits do not directly consider job priority, nor are they gathered over
time to satisfy a job.
The basic use/configuration of the feature is truly quite simple.
You specify the set of limits you want associated with a job via the
concurrency_limits parameter in a submit file. The limits are specified
as a list, e.g. concurrency_limits = a,b,b,c - signifying a job that
needs 1 A, 2 Bs and 1 C to run. Limits are case insensitive.
You configure limits within the Negotiator's configuration file with
X_LIMIT = #, where X is the name of a limit and # is the max you want to
allow at one time. For all limits that do not explicitly have a _LIMIT
configuration, there is CONCURRENCY_LIMIT_DEFAULT = # to specify their
maximum. The default's default is large, meaning a job requesting a
limit that is not configured will not be burdened by the request. The
default also allows for the possibility to limit jobs generally, such as
placing a cap on the number of jobs any one user or set of users may
have running at one time.
The current usage of limits is indirectly accessible via condor_userprio
-long.
Assign some number of licenses for use by Condor jobs, say
300. In your Negotiator's configuration add: MYLICENSE_LIMIT = 300
Now in each job that needs the license add:
concurrency_limits = MYLICENSE
Condor does not check out the licenses from Flexlm, it just
tries to keep the number of jobs that /will/ check out
licenses under control.
Presumably I can write a cron job (startd cron?) on my negotiator that
can update resource counts based on external factors -- is it sufficient
to do a reconfig to have the negotiator see the updated values? What
happens if I decrease a limit and there's more jobs running now than I
say I have resources? Do things preempt? Or does Condor just stop
running jobs that request this resource?
A reconfig is enough to alter configured maximums, i.e. X_LIMIT = 1 to
X_LIMIT = 100. There are some clever tricks you could play to alter the
apparent usage of a limit.
Condor will not actively preempt or otherwise stop jobs when a limit is
exceeded, such as if you lower it. When a limit is reached or exceeded,
no new jobs requiring the limit are matched. They will be rejected with
a reason specifying that a limit they requested was not available - the
specific limit is not reported.
Especially since you are sharing licenses between batch jobs
and interactive users you should setup your jobs to notice if
they failed because they did not checkout a license. This
configuration will be specific to your application, but the
document you already mentioned has a good example [2]. If
your program exits with code 52 when it fails to checkout a
license you'd add this to your job: on_exit_remove =
(ExitBySignal == TRUE) || (ExitCode != 52)
Optionally if your license resources supports queuing you can have your
batch jobs wait for a license instead of dying. Depending on how long
things run for and how expensive your licenses are this can be a good
option. For example: if licenses are >> compute hardware it's better to
hold the hardware and queue via the FlexLM manager for the license to
maximize license use than to return to Condor's queue and undergo
another negotiation cycle.
- Ian
Very good point.
Best,
matt