Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] Condor configuratoin for Multi-CPU machines
dear all,
I have just put together a small cluster of machines
that are dual core dual cpu machines all running WinXP(X64) and wanted to
share and get feedback on the configuration I have for them. I have tried to set
them up so that the machines can service jobs that require 1, 2 or 4 CPUs. This
should allow jobs that require 2 CPUs to run, or half the machine's resources to
co-exist alongside two 1 CPU jobs. There is also provision for jobs that require
4 CPUs (or 2 CPUs) to start when there is 1 CPU free and to prevent further
jobs from being placed on the machine. This was to prevent jobs
that require 4 CPUs from being blocked as the scheduler fills the
machines with 1CPU jobs and prevents a 4 CPU job from claiming the
whole machine because it is never completely free. So here it is, followed
by a few questions for the more experienced :
(These are just the bits I've changed because of the
multi-CPU nature of the computers.)
--------------------------------------------
NUMBER_OF_CLAIMED_CPUS = \
( \
(1*(VM1_State =?= "Claimed")) + \
(1*(VM2_State =?= "Claimed"))
+ \
(1*(VM3_State =?= "Claimed")) + \
(1*(VM4_State =?= "Claimed")) + \
(2*(VM5_State =?= "Claimed"))
+ \
(2*(VM6_State =?= "Claimed")) + \
(4*(VM7_State =?= "Claimed")) \
)
MAINTAIN_CLAIM = \
( \
((VirtualMachineID == 1)&&(VM1_State =?= "Claimed")) ||
\
((VirtualMachineID == 2)&&(VM2_State =?= "Claimed"))
|| \
((VirtualMachineID == 3)&&(VM3_State =?=
"Claimed")) || \
((VirtualMachineID == 4)&&(VM4_State
=?= "Claimed")) || \
((VirtualMachineID ==
5)&&(VM5_State =?= "Claimed")) || \
((VirtualMachineID
== 6)&&(VM6_State =?= "Claimed")) || \
((VirtualMachineID == 7)&&(VM7_State =?= "Claimed"))
\
)
# To claim a multi-cpu vm you must specify CPUS in the
job description
# this macro returns a match for the 1 CPU machines if
the job does not define the CPUs
# this macro also prevents jobs that don't specify
their cpu requirement don't claim the
# 4 CPU VM and restrict the machine to just one
job
JOB_CPUS_MATCHES_VM_CPUS = \
( \
((CPUS
=?= TARGET.CPUS) == TRUE) \
|| ((CPUS == 1) && (TARGET.CPUS
=?= UNDEFINED)) \
)
# the start
_expression_ is evaluate by each VM
# 4 is the total number of CPUs on each
machine
START = \
(
\
(4 > $(NUMBER_OF_CLAIMED_CPUS))
\
&& $(JOB_CPUS_MATCHES_VM_CPUS) \
) ||
$(MAINTAIN_CLAIM)
# These are
dedicated machines
IsOwner =
False
STARTD_VM_EXPRS = State,
Activity, ImageSize, EnteredCurrentActivity
# the machine has 4 cpus and 2Gig RAM so there values are 3
times as much
# because we advertise the machine in three different
ways
MEMORY = 6144
NUM_CPUS
= 12
#the VMs are
defined in this order so VMs 1-4 have 1 CPU, VM5-6 have 2 CPUS and vm7 has
4cpus
VIRTUAL_MACHINE_TYPE_1 = cpus=1,
ram=512
VIRTUAL_MACHINE_TYPE_2 = cpus=2, ram=1024
VIRTUAL_MACHINE_TYPE_3 =
cpus=4, ram=2048
NUM_VIRTUAL_MACHINES_TYPE_1 = 4
NUM_VIRTUAL_MACHINES_TYPE_2 =
2
NUM_VIRTUAL_MACHINES_TYPE_3 = 1
# These are dedicated
machines
VIRTUAL_MACHINES_CONNECTED_TO_KEYBOARD =
0
VIRTUAL_MACHINES_CONNECTED_TO_CONSOLE =
0
--------------------------------------------
The above configuration has just gone into use on our
cluster, and is working reasonably well. My only concern is with the
MAINTAIN_CLAIM macro which appears to be necessary. This is because when a
machine accepts a job that goes above the 4 CPUs then the start _expression_
becomes false. This then caused a job to be dropped because of this, something I
could only prevent with the MAINTAIN_CLAIM macro. If anyone can enlighten me as
to why this is that would be great.
I have also looked at getting the different VMs to run
jobs at different priorities, particularly the 4 cpu vm which implies that all
jobs on this VM require the whole machine. my desire would be for those jobs to
be run at a lower priority allowing any jobs on the other VMs to finish more
rapidly and put the 4 CPU vm in full control of the machine as it so wants,
Suspending though is unhelpful as there are still CPU cycles that can be used
whilst it waits for the whole machine to be freed.
So what do you all think? Are there any mistakes that I
have not spotted that will cause problems? or is there a better way of doing
this type of thing.
I hope this helps someone,
Peter
Ps. Thank you condor team, this software is very
helpful.
Dr Peter Myerscough-Jackopson -
Engineer
MULTIPLE ACCESS COMMUNICATIONS
LIMITED
Delta House, The University of Southampton Science Park,
Southampton, SO16 7NS,
United Kingdom.
Tel: +44 (0)23
8076 7808 Fax: +44 (0)23 8076 0602
Web: http://www.macltd.com/
Email: peter.myerscough-jackopson@xxxxxxxxxx