Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] how to submit a job to specific WNs
- Date: Thu, 10 May 2018 19:56:32 +0000
- From: John M Knoeller <johnkn@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] how to submit a job to specific WNs
can you run
condor_q <jobid> -af:jr Requirements
were <jobid> is the job id of one of your jobs, and then send me the output?
I would like to see what the Requirements expression for the job is once it gets to the Schedd.
It would be safer, if your job requirements were specified using the TARGET prefix like this
condor_requirements="(TARGET.Opsys == "linux") && (TARGET.OpSysMajorVer == 7) && (TARGET.SkaRes == True)"
If you don't use TARGET, and your job has the Opsys, OpSysMajorVer or SkaRes attributes, then an attribute reference without TARGET will resolve against the attribute in the job ad instead of the attribute in the Startd ad.
Also, this statement:
START = (NODE_IS_HEALTHY =?= True) && (Owner =?= "catalin" || Owner =?= "jpk" || X509UserProxyVOName =?= "skatelescope.eu") && (NordugridQueue =?= "ska")
should be using == rather than =?=, because we want START to be undefined when there is no job to compare it to. when you use =?= START becomes false in the absence of a job, which makes the Startd go into OWNER state.
-tj
-----Original Message-----
From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of Catalin Condurache - UKRI STFC
Sent: Thursday, May 10, 2018 11:07 AM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: [HTCondor-users] how to submit a job to specific WNs
Hello,
I have been trying to schedule some jobs (owned by a certain VO) submitted to the batch farm (ARC-CE + HTCondor) to specific WNs, however I achieved only partial results (following a recipe at https://www.gridpp.ac.uk/wiki/Enable_Queues_on_ARC_HTCondor), and I wonder whether I could get some help from the list.
So on ARC-CE arc-ce03
#> cat /etc/arc.conf
...
[queue/ska]
name="ska"
homogeneity="True"
comment="SKA queue"
defaultmemory="3000"
nodememory="16384"
MainMemorySize=16384
OSFamily="linux"
OSName="ScientificSL"
OSVersion="7.3"
opsys="ScientificSL"
opsys="7.3"
opsys="Carbon"
nodecpu="Intel Xeon E5440 @ 2.83GHz"
condor_requirements="(Opsys == "linux") && (OpSysMajorVer == 7) && (SkaRes == True)"
authorizedvo="skatelescope.eu"
...
Also I have configured 4 WNs as:
[root@lcg2170 config.d]# cat /etc/condor/config.d/99-catalin
SkaRes = True
STARTD_ATTRS = $(STARTD_ATTRS), SkaRes
START = $(START) && (NordugridQueue =?= "ska") && (X509UserProxyVOName =?= "skatelescope.eu")
[root@lcg2195 config.d]# cat 49-catalin
RANK=1.0
SkaRes = True
STARTD_ATTRS = $(STARTD_ATTRS), SkaRes
START = $(START) && (NordugridQueue == "ska")
[root@lcg2197 ~]# cat /etc/condor/config.d/99-catalin
SkaRes = True
STARTD_ATTRS = $(STARTD_ATTRS), SkaRes
START = $(START) && (NordugridQueue == "ska")
[root@lcg1716 config.d]# cat 99-catalin
SkaRes = True
STARTD_ATTRS = $(STARTD_ATTRS), SkaRes
START = (NODE_IS_HEALTHY =?= True) && (Owner =?= "catalin" || Owner =?= "jpk" || X509UserProxyVOName =?= "skatelescope.eu") && (NordugridQueue =?= "ska")
My job test (which I submit with 'arcsub -c arc-ce03.gridpp.rl.ac.uk ./list_of_rpms.xrsl' ) is
-bash-4.1$ cat ./list_of_rpms.xrsl
&(executable="query_rpm.sh")
(stdout="test.out")
(stderr="test.err")
(jobname="ARC-HTCondor test")
(count=2)
(memory=1500)
(queue="ska")
The results are not as expected, as the jobs are getting submitted, but they are scheduled on random nodes.
However few things are as expected i.e.
[root@lcg1716 config.d]# condor_who
[root@lcg1716 config.d]#
[root@lcg2170 config.d]# condor_who
[root@lcg2170 config.d]#
[root@lcg2195 config.d]# condor_who
[root@lcg2195 config.d]#
[root@lcg2197 ~]# condor_who
OWNER CLIENT SLOT JOB RUNTIME
tna62a001@xxxxxxxxxxxxxxx arc-ce03.gridpp.rl.ac.uk 1_17 24276544.0 8+02:32:36
alicesgm@xxxxxxxxxxxxxxx arc-ce03.gridpp.rl.ac.uk 1_18 24266128.0 8+06:46:55
Also
[root@arc-ce03 config.d]# condor_status -constraint '(Opsys == "linux") && (OpSysMajorVer == 7) && (SkaRes == True)'
Name OpSys Arch State Activity LoadAv Me
slot1@xxxxxxxxxxxxxxxxxxxxxxx LINUX X86_64 Owner Idle 0.060 17
slot1@xxxxxxxxxxxxxxxxxxxxxxx LINUX X86_64 Owner Idle 0.000 21
slot1@xxxxxxxxxxxxxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 21
slot1@xxxxxxxxxxxxxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.540 21
slot1_17@xxxxxxxxxxxxxxxxxxxxxxx LINUX X86_64 Claimed Busy 0.000
slot1_18@xxxxxxxxxxxxxxxxxxxxxxx LINUX X86_64 Claimed Busy 0.000
Machines Owner Claimed Unclaimed Matched Preempting Drain
X86_64/LINUX 6 2 2 2 0 0 0
Total 6 2 2 2 0 0 0
On above output I believe the 4 WNs are correctly advertising themselves as available for 'ska' jobs (SkaRes == True)
What it appears I cannot control yet is the Negotiator does not match the Job requirements to advertised resources.
So my question is what am I missing and where (it could be on ARC-CE in /etc/condor/config.d/ but I do not know what to add there)
Also as an detail, our batch farm (ARC-CE + HTCondor) is running Docker containers for each job_slot, not sure if this is the problem here or not.
Many thanks for any help,
Catalin Condurache
RAL Tier-1
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/