Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[HTCondor-users] how to submit a job to specific WNs
- Date: Thu, 10 May 2018 16:07:08 +0000
- From: Catalin Condurache - UKRI STFC <catalin.condurache@xxxxxxxxxx>
- Subject: [HTCondor-users] how to submit a job to specific WNs
Hello,
I have been trying to schedule some jobs (owned by a certain VO) submitted to the batch farm (ARC-CE + HTCondor) to specific WNs, however I achieved only partial results (following a recipe at https://www.gridpp.ac.uk/wiki/Enable_Queues_on_ARC_HTCondor), and I wonder whether I could get some help from the list.
So on ARC-CE arc-ce03
#> cat /etc/arc.conf
...
[queue/ska]
name="ska"
homogeneity="True"
comment="SKA queue"
defaultmemory="3000"
nodememory="16384"
MainMemorySize=16384
OSFamily="linux"
OSName="ScientificSL"
OSVersion="7.3"
opsys="ScientificSL"
opsys="7.3"
opsys="Carbon"
nodecpu="Intel Xeon E5440 @ 2.83GHz"
condor_requirements="(Opsys == "linux") && (OpSysMajorVer == 7) && (SkaRes == True)"
authorizedvo="skatelescope.eu"
...
Also I have configured 4 WNs as:
[root@lcg2170 config.d]# cat /etc/condor/config.d/99-catalin
SkaRes = True
STARTD_ATTRS = $(STARTD_ATTRS), SkaRes
START = $(START) && (NordugridQueue =?= "ska") && (X509UserProxyVOName =?= "skatelescope.eu")
[root@lcg2195 config.d]# cat 49-catalin
RANK=1.0
SkaRes = True
STARTD_ATTRS = $(STARTD_ATTRS), SkaRes
START = $(START) && (NordugridQueue == "ska")
[root@lcg2197 ~]# cat /etc/condor/config.d/99-catalin
SkaRes = True
STARTD_ATTRS = $(STARTD_ATTRS), SkaRes
START = $(START) && (NordugridQueue == "ska")
[root@lcg1716 config.d]# cat 99-catalin
SkaRes = True
STARTD_ATTRS = $(STARTD_ATTRS), SkaRes
START = (NODE_IS_HEALTHY =?= True) && (Owner =?= "catalin" || Owner =?= "jpk" || X509UserProxyVOName =?= "skatelescope.eu") && (NordugridQueue =?= "ska")
My job test (which I submit with 'arcsub -c arc-ce03.gridpp.rl.ac.uk ./list_of_rpms.xrsl' ) is
-bash-4.1$ cat ./list_of_rpms.xrsl
&(executable="query_rpm.sh")
(stdout="test.out")
(stderr="test.err")
(jobname="ARC-HTCondor test")
(count=2)
(memory=1500)
(queue="ska")
The results are not as expected, as the jobs are getting submitted, but they are scheduled on random nodes.
However few things are as expected i.e.
[root@lcg1716 config.d]# condor_who
[root@lcg1716 config.d]#
[root@lcg2170 config.d]# condor_who
[root@lcg2170 config.d]#
[root@lcg2195 config.d]# condor_who
[root@lcg2195 config.d]#
[root@lcg2197 ~]# condor_who
OWNER CLIENT SLOT JOB RUNTIME
tna62a001@xxxxxxxxxxxxxxx arc-ce03.gridpp.rl.ac.uk 1_17 24276544.0 8+02:32:36
alicesgm@xxxxxxxxxxxxxxx arc-ce03.gridpp.rl.ac.uk 1_18 24266128.0 8+06:46:55
Also
[root@arc-ce03 config.d]# condor_status -constraint '(Opsys == "linux") && (OpSysMajorVer == 7) && (SkaRes == True)'
Name OpSys Arch State Activity LoadAv Me
slot1@xxxxxxxxxxxxxxxxxxxxxxx LINUX X86_64 Owner Idle 0.060 17
slot1@xxxxxxxxxxxxxxxxxxxxxxx LINUX X86_64 Owner Idle 0.000 21
slot1@xxxxxxxxxxxxxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 21
slot1@xxxxxxxxxxxxxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.540 21
slot1_17@xxxxxxxxxxxxxxxxxxxxxxx LINUX X86_64 Claimed Busy 0.000
slot1_18@xxxxxxxxxxxxxxxxxxxxxxx LINUX X86_64 Claimed Busy 0.000
Machines Owner Claimed Unclaimed Matched Preempting Drain
X86_64/LINUX 6 2 2 2 0 0 0
Total 6 2 2 2 0 0 0
On above output I believe the 4 WNs are correctly advertising themselves as available for 'ska' jobs (SkaRes == True)
What it appears I cannot control yet is the Negotiator does not match the Job requirements to advertised resources.
So my question is what am I missing and where (it could be on ARC-CE in /etc/condor/config.d/ but I do not know what to add there)
Also as an detail, our batch farm (ARC-CE + HTCondor) is running Docker containers for each job_slot, not sure if this is the problem here or not.
Many thanks for any help,
Catalin Condurache
RAL Tier-1