[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Limit Number of Parallel Jobs



On Monday, July 11, 2011 at 11:19 AM, Sassy Natan wrote:

Hi Group,

Is there any way to limit the number of parallel jobs for a specific job?
I know about CONCURRENCY_LIMIT and QUOTA_GROUPS, but I'm interesting
to provide my users a way to define this dynamic.

For Example:

1. Say I have 80 Slots in my Pool.
2. User A send one big job that takes all the available slots (Total
of 200 Jobs)
3. Each job runs around 5 hours until it finished.

4, Now User B want to run two simple jobs.
5. The queue is full with 200 jobs assign by the user A.
6. User B will have to wait until 121 Job will be finished, and one
slot will get free.

This is off course assuming that User A and User B have the same priority.
It is understood if a User B have a higher priority, once a job assign
by User A will be finished ( around 5 hours), User B job will be
started.

But what I would like to do is to prevent from user B to wait until
one of the User A job will finished.

It can be done by RANK, but this is strict to a machine definition and
also involved with Preemption, which is not good for me.

In the DAG option there is MAX_JOB attribute, but this requires to
convert a a simple job into a DAG syntax. It is possible but maybe
there is a better way .... :-)
These are the following options you have to limit concurrency Condor.

1. Use group quotas (http://www.cs.wisc.edu/condor/manual/v7.6/3_4User_Priorities.html#25780). If you want quotas to be strictly adhered to, such that a group can't go over their assigned quota, make sure GROUP_AUTOREGROUP_<groupname> is False (or GROUP_ACCEPT_SURPLUS = False to apply this to all groups).

2. Use a counter and concurrency limits (http://www.cs.wisc.edu/condor/manual/v7.6/3_13Setting_Up.html#38387)

3. Use a DAG and set MAX_JOB on the DAG.

Of those three options group quotas are the most dynamic and, with the introduction of hierarchies in 6.7.x, also the most flexible in terms of the policies you can create.

In your specific case you may wish to look in to quotas *plus* preemption. If you give group A a quota of 50% and group B a quota of 50% but set GROUP_ACCEPT_SURPLUS=True, you can use a preemption _expression_ to allow group B to preempt running jobs from group A when group A is beyond their quota. Something like:

NEGOTIATOR_CONSIDER_PREEMPTION=True
# Preempt short running jobs first
PREEMPTION_RANK = 100000 - $(ActivityTimer)
# Preempt a group's running jobs if they're over quota and the job being negotiated comes
# from a group that is under quota.
PREEMPTION_REQUIREMENTS = ((SubmitterGroupQuota =!= UNDEFINED && (SubmitterGroupResourcesInUse < SubmitterGroupQuota)) &&
((RemoteGroupQuota=?=UNDEFINED) ||
(RemoteGroupResourcesInUse > RemoteGroupQuota)))

That's the rough idea. You'll likely need to tune that for your case no doubt. I haven't tested that. Just wrote it out quickly for this answer -- so caveat utilitor.

This would all A to go beyond their quota when the system was empty, but retract their surplus.

Regards,
- Ian

---
Ian Chesal

Cycle Computing, LLC
Leader in Open Compute Solutions for Clouds, Servers, and Desktops
Enterprise Condor Support and Management Tools

http://www.cyclecomputing.com
http://www.cyclecloud.com
http://twitter.com/cyclecomputing