Hi all, we are somehow stuck with trying to get preemption going while guaranteeing some minimal run times. For this, we define on the startd this MinRunTimeHours = 1 STARTD_ATTRS = MinRunTimeHours (we will have several classes of machines where we set this to 1, 5, 10 or 20 hours). On the negotiator, we set JobExceedsMinRunTime = $(ActivationTimer) > ( MinRunTimeHours * 60) NewUserBetterPrio = RemoteUserPrio > SubmitterUserPrio * 1.2 PREEMPTION_REQUIREMENTS = ($(JobExceedsMinRunTime)) && ($(NewUserBetterPrio)) for debugging it look a bit longer, but does not really add much else to it[1]. During a negotiation cycle, PREEMPTION_REQUIREMENTS does evaluate to true and as we do not set rank to be anything else as 0, we would expect the idle job to preempt the running job. We currently have pslot preemption enabled as all nodes feature a single large partition-able slot: ALLOW_PSLOT_PREEMPTION = True MAXJOBRETIREMENTTIME = 600 NEGOTIATOR_DEBUG = D_FULLDEBUG NEGOTIATOR_CONSIDER_EARLY_PREEMPTION = True (same happens with False here) For testing we submit two job clusters which fully fill a target node and to make things easier, both clusters compete for the very same machine, in our case "a3305" via Requirements = (Machine == "a3305.atlas.local") in the jib submit file. Debug output for a preemption match looks like 03/16/20 16:44:53 Classad debug: 1584347460 --> 1584347460 03/16/20 16:44:53 Classad debug: [0.00906ms] JobStart --> 1584347460 03/16/20 16:44:53 Classad debug: time() --> 1584377093 03/16/20 16:44:53 Classad debug: 1584347460 --> 1584347460 03/16/20 16:44:53 Classad debug: [0.00596ms] JobStart --> 1584347460 03/16/20 16:44:53 Classad debug: [0.03695ms] ifThenElse(JobStart isnt undefined,(time() - JobStart),0) --> 29633 03/16/20 16:44:53 Classad debug: 1 --> 1 03/16/20 16:44:53 Classad debug: [0.00691ms] MinRunTimeHours --> 1 03/16/20 16:44:53 Classad debug: [0.00095ms] RemoteUserPrio --> 353331 03/16/20 16:44:53 Classad debug: [0.00095ms] SubmitterUserPrio --> 230.536 03/16/20 16:44:53 Classad debug: "a3305.atlas.local" --> a3305.atlas.local 03/16/20 16:44:53 Classad debug: [0.00691ms] Machine --> a3305.atlas.local 03/16/20 16:44:53 Classad debug: [0.00095ms] MY --> CLASSAD 03/16/20 16:44:53 Classad debug: [0.00095ms] "user.a@xxxxxxxxxxx" --> user.a@xxxxxxxxxxx 03/16/20 16:44:53 Classad debug: [0.01502ms] MY.AccountingGroup --> user.a@xxxxxxxxxxx 03/16/20 16:44:53 Classad debug: .RIGHT --> CLASSAD 03/16/20 16:44:53 Classad debug: [0.00691ms] TARGET --> CLASSAD 03/16/20 16:44:53 Classad debug: "user.b" --> user.b 03/16/20 16:44:53 Classad debug: [0.02098ms] TARGET.AccountingGroup --> user.b 03/16/20 16:44:53 Classad debug: MY --> CLASSAD 03/16/20 16:44:53 Classad debug: 0.0 --> 0 03/16/20 16:44:53 Classad debug: [0.01311ms] MY.rank --> 0 03/16/20 16:44:53 Classad debug: [0.15998ms] ifThenElse(JobStart isnt undefined,(time() - JobStart),0) > (MinRunTimeHours * 60) && RemoteUserPrio > SubmitterU serPrio * 1.200000000000000E+00 && Machine isnt undefined && MY.AccountingGroup isnt undefined && TARGET.AccountingGroup isnt undefined && (MY.rank isnt undef ined || TARGET.rank isnt undefined) --> TRUE But after doing this for every running job it ends with (lines from a later cycle): 03/16/20 16:55:35 Send END_NEGOTIATE to remote schedd 03/16/20 16:55:35 Submitter user.b@xxxxxxxxxxx got all it wants; removing it. 03/16/20 16:55:35 resources used by user.b@xxxxxxxxxxx are 0.000000 Anyone an idea what we are doing wrong here? cheers and thanks a lot in advance for any hint! Carsten [1] PREEMPTION_REQUIREMENTS = debug( $(JobExceedsMinRunTime) && $(NewUserBetterPrio) && Machine =!= UNDEFINED && MY.AccountingGroup =!= UNDEFINED && TARGET.AccountingGroup =!= UNDEFINED && (MY.rank =!= UNDEFINED || TARGET.rank =!= UNDEFINED)) -- Dr. Carsten Aulbert, Max Planck Institute for Gravitational Physics, CallinstraÃe 38, 30167 Hannover, Germany Phone: +49 511 762 17185
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature