Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] Partitionable Slots
- Date: Thu, 19 Jun 2014 15:37:19 -0500
- From: Todd Tannenbaum <tannenba@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] Partitionable Slots
On 6/19/2014 11:53 AM, Douglas Thain wrote:
Howdy -
We have been using partitionable slots to run multi-core jobs for the
last few months. We are set up to have a single partitionable slot
and no static slots, divided by CPU. Our users are submitting a mix
of jobs, using request_cpus to select the size of slot desired.
When initially turned on, it works. Slots get created for the exact
size of each job, so that, for example, a two-core job is matched to a
two-core slot. However, after a while, jobs begin to be matched in
slots that are too big. For example, we see lots of one-cpu jobs
running on 4-cpu slots.
How do we fix this so that jobs only run in slots of the appropriate size?
(Based on some previous discussions, we set CLAIM_WORKLIFE=0, so as to
force claims to expire at the end of each job, thus causing them to be
returned to the parent partitionable slot. But, that doesn't seem to
be happening.)
The relevant configuration is:
NUM_SLOTS = 1
NUM_SLOTS_TYPE_1 = 1
SLOT_TYPE_1 = cpus=100%
SLOT_TYPE_1_PARTITIONABLE = true
CLAIM_WORKLIFE = 0
Any suggestions?
Hi Doug -
I tried the above config on my Windows 7 laptop using the v8.2.0 release
candidate and everything seemed to work as expected, i.e. with
CLAIM_WORKLIFE = 0 the claim expired at the end of the job and thus the
dynamic slot was returned to the parent partitionable slot. When I
commented out the CLAIM_WORKLIFE=0 line, then the dynamic slot was reused.
Besides the CLAIM_WORKLIFE = 0 trick (that is a condor_startd knob, btw,
maybe you only set it in your central manager), you could also put the
following into your job requirements expression:
requirements = DynamicSlot =!= True || Cpus =?= RequestCpus
The above will allow the job to match any static or partitionable slot,
but if the slot is a dynamic slot, it will only re-use the dynamic slot
if it has the same number of CPUs. I like this better than the
CLAIM_WORKLIFE workaround because you still get the advantages of
reusing claims.
Of course, you could inject the above into all jobs via
APPEND_REQUIREMENTS in the config file, or you could opt to put the
above constraint into the startd START expression to enforce this "exact
CPU fit" policy on the startd side.
Hope the above helps,
Todd
Doug
P.S. We have a neat little display that shows slot size based on CPU:
http://condor.cse.nd.edu/condor_matrix.cgi
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
--
Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
Center for High Throughput Computing Department of Computer Sciences
HTCondor Technical Lead 1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132 Madison, WI 53706-1685