Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[HTCondor-users] parallel universe : some questions about nodes allocation and preemption
- Date: Thu, 22 Feb 2018 16:56:19 +0100
- From: Christophe DIARRA <diarra@xxxxxxxxxxxxx>
- Subject: [HTCondor-users] parallel universe : some questions about nodes allocation and preemption
Hello,
I am doing some tests with the parallel universe on a small test pool (4
Worker Nodes). I am using static slots on the WN. Each WN has 8 cores.
I am not very opimistic about the feasability of what I am asking for
but before giving up I would like to have a confirmation from the
experts. Below are my questions.
1) For a MPI job, is it possible to force the number of nodes and also
balance the slots allocation between these nodes ? For example, if I
submit a 16 cores MPI jobs (machine_count=16), is it possible to tell to
HTCondor to allocate only 2 WN with 8 cores each ? With Torque/Maui we
dot it with "#PBS -l nodes=2:ppn=8". We plan to migrate our parallel
cluster from Torque/Maui to HTC. The migration is already done for the
single/multicores cluster.
In the NEGOTIATOR_PRE_JOB_RANK expression I use a ranking based on a
WN_ID (IP address converted to int), to have depth_first allocation.
This works fine. But suppose now that I submit a 10 cores MPI job while
all the slots are Idle, I will have 8 cores Claimed on one WN and 2
cores on the next WN based on the WN_ID ranking. I would prefer to have
5 cores allocated on each WN (balanced allocation) and avoid the other
combinaisons.
2) In my tests, one user (puser) submit parallel jobs. Another user
(vuser) submit vanilla single core jobs. puser has highier priority than
vuser. My PREEMPTION_REQUIREMENTS allows the preemption of vuser's jobs.
It works, but the problem is the following: suppose that 32 vuser jobs
are already running, if puser submit a 2 cores MPI job, all the 32 jobs
of vuser will be preempted and put back in queue. It is possible to
configure HTCondor to preempt only the required number of vanilla jobs ?
In my example, I would like to have only 2 vallina jobs preempted
instead of 32 jobs. What I have observed is the following: at each
negociation cycle HTCondor preempt n slots (when possible) if the MPI
jobs need n slots in total and the already preempted slots have not yet
finished retiring/vacating. At the end there may be n+n+n+... slots
preempted and the MPI job will use only n of them while the other will
stay 'Claimed Idle'.
3) In relation with 2).
When the MPI jobs starts, the preempted unused slots will remain
'Claimed Idle' for ~10 minutes before beeing 'Unclaimed Idle' ou
'Claimed Busy'. Setting 'UNUSED_CLAIM_TIMEOUT = 120' on the Scheduler
has no effect. Is there an explanation for that ?
Thanks in advance for you help,
Christophe.
--
Christophe DIARRA
Institut de Physique Nucleaire
15 Rue Georges Clemenceau
S2I/D2I - Bat 100A - Piece A108
F91406 ORSAY Cedex
Tel: +33 (0)1 69 15 65 60 / +33 (0)6 31 26 23 69
Fax: +33 (0)1 69 15 64 70 / E-mail:diarra@xxxxxxxxxxxxx