Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] how to submit a job to specific WNs
- Date: Fri, 11 May 2018 14:46:28 +0000
- From: John M Knoeller <johnkn@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] how to submit a job to specific WNs
So it looks like none of the statements in your condor_requirements are in the job's Requirements expression.
I think you are correct that the DefaultDocker transform has replaced the statements
from your condor_requirements with it's own.
Do you control the job transform configuration? I think we could modify that transform so that it preserves your original requirements while adding the statements needed to make it a Docker job - but unless we can modify that transform, your condor_requirements isn't going to have any effect.
-tj
-----Original Message-----
From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of Catalin Condurache - UKRI STFC
Sent: Friday, May 11, 2018 8:25 AM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] how to submit a job to specific WNs
Hi John,
I ran your command before any other changes and
24379845.0 ( TARGET.HasDocker ) && ( TARGET.Disk >= RequestDisk ) && ( TARGET.Memory >= RequestMemory ) && ( TARGET.Cpus >= RequestCpus ) && ( TARGET.HasFileTransfer ) && ( x509UserProxyVOName =?= "atlas" && NumJobStarts == 0 || x509UserProxyVOName =!= "atlas" )
Then I added 'TARGET.' to condor_requirements in /etc/arc.conf but still very similar output
24379911.0 ( TARGET.HasDocker ) && ( TARGET.Disk >= RequestDisk ) && ( TARGET.Memory >= RequestMemory ) && ( TARGET.Cpus >= RequestCpus ) && ( TARGET.HasFileTransfer ) && ( x509UserProxyVOName =?= "atlas" && NumJobStarts == 0 || x509UserProxyVOName =!= "atlas" )
All the above are set in /etc/condor/config.d/67job-transform-docker.config on ARC node
# Convert job to Docker universe
JOB_TRANSFORM_NAMES = $(JOB_TRANSFORM_NAMES), DefaultDocker
JOB_TRANSFORM_DefaultDocker @=end
[
Requirements = JobUniverse == 5 && DockerImage =?= undefined && Owner =!= "nagios";
...
set_Requirements = ( TARGET.HasDocker ) && ( TARGET.Disk >= RequestDisk ) && ( TARGET.Memory >= RequestMemory ) && ( TARGET.Cpus >= RequestCpus ) && ( TARGET.HasFileTransfer ) && ( x509UserProxyVOName =?= "atlas" && NumJobStarts == 0 || x509UserProxyVOName =!= "atlas");
...
]
@end
AFAICT those condor_requirements from arc.conf are not passed to condor on ARC node.
Also I do not want to add more specific requirements to that line 'set_Requirements = ...'
Regards,
Catalin
> -----Original Message-----
> From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf
> Of John M Knoeller
> Sent: 10 May 2018 20:57
> To: HTCondor-Users Mail List
> Subject: Re: [HTCondor-users] how to submit a job to specific WNs
>
> can you run
>
> condor_q <jobid> -af:jr Requirements
>
> were <jobid> is the job id of one of your jobs, and then send me the output?
> I would like to see what the Requirements expression for the job is once it gets
> to the Schedd.
>
> It would be safer, if your job requirements were specified using the TARGET
> prefix like this
>
> condor_requirements="(TARGET.Opsys == "linux") && (TARGET.OpSysMajorVer
> == 7) && (TARGET.SkaRes == True)"
>
> If you don't use TARGET, and your job has the Opsys, OpSysMajorVer or SkaRes
> attributes, then an attribute reference without TARGET will resolve against the
> attribute in the job ad instead of the attribute in the Startd ad.
>
> Also, this statement:
>
> START = (NODE_IS_HEALTHY =?= True) && (Owner =?= "catalin" ||
> Owner =?= "jpk" || X509UserProxyVOName =?= "skatelescope.eu") &&
> (NordugridQueue =?= "ska")
>
> should be using == rather than =?=, because we want START to be undefined
> when there is no job to compare it to. when you use =?= START becomes false
> in the absence of a job, which makes the Startd go into OWNER state.
>
> -tj
>
>
> -----Original Message-----
> From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf
> Of Catalin Condurache - UKRI STFC
> Sent: Thursday, May 10, 2018 11:07 AM
> To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
> Subject: [HTCondor-users] how to submit a job to specific WNs
>
> Hello,
>
> I have been trying to schedule some jobs (owned by a certain VO) submitted to
> the batch farm (ARC-CE + HTCondor) to specific WNs, however I achieved only
> partial results (following a recipe at
> https://www.gridpp.ac.uk/wiki/Enable_Queues_on_ARC_HTCondor), and I
> wonder whether I could get some help from the list.
>
> So on ARC-CE arc-ce03
>
> #> cat /etc/arc.conf
>
> ...
> [queue/ska]
> name="ska"
> homogeneity="True"
> comment="SKA queue"
> defaultmemory="3000"
> nodememory="16384"
> MainMemorySize=16384
> OSFamily="linux"
> OSName="ScientificSL"
> OSVersion="7.3"
> opsys="ScientificSL"
> opsys="7.3"
> opsys="Carbon"
> nodecpu="Intel Xeon E5440 @ 2.83GHz"
> condor_requirements="(Opsys == "linux") && (OpSysMajorVer == 7) &&
> (SkaRes == True)"
> authorizedvo="skatelescope.eu"
> ...
>
>
> Also I have configured 4 WNs as:
>
> [root@lcg2170 config.d]# cat /etc/condor/config.d/99-catalin SkaRes = True
> STARTD_ATTRS = $(STARTD_ATTRS), SkaRes START = $(START) &&
> (NordugridQueue =?= "ska") && (X509UserProxyVOName =?=
> "skatelescope.eu")
>
>
> [root@lcg2195 config.d]# cat 49-catalin
> RANK=1.0
> SkaRes = True
> STARTD_ATTRS = $(STARTD_ATTRS), SkaRes
> START = $(START) && (NordugridQueue == "ska")
>
>
> [root@lcg2197 ~]# cat /etc/condor/config.d/99-catalin SkaRes = True
> STARTD_ATTRS = $(STARTD_ATTRS), SkaRes START = $(START) &&
> (NordugridQueue == "ska")
>
>
> [root@lcg1716 config.d]# cat 99-catalin
> SkaRes = True
> STARTD_ATTRS = $(STARTD_ATTRS), SkaRes
> START = (NODE_IS_HEALTHY =?= True) && (Owner =?= "catalin" || Owner =?=
> "jpk" || X509UserProxyVOName =?= "skatelescope.eu") && (NordugridQueue
> =?= "ska")
>
>
> My job test (which I submit with 'arcsub -c arc-ce03.gridpp.rl.ac.uk
> ./list_of_rpms.xrsl' ) is
>
> -bash-4.1$ cat ./list_of_rpms.xrsl
>
> &(executable="query_rpm.sh")
> (stdout="test.out")
> (stderr="test.err")
> (jobname="ARC-HTCondor test")
> (count=2)
> (memory=1500)
> (queue="ska")
>
>
>
> The results are not as expected, as the jobs are getting submitted, but they are
> scheduled on random nodes.
> However few things are as expected i.e.
>
> [root@lcg1716 config.d]# condor_who
> [root@lcg1716 config.d]#
>
> [root@lcg2170 config.d]# condor_who
> [root@lcg2170 config.d]#
>
> [root@lcg2195 config.d]# condor_who
> [root@lcg2195 config.d]#
>
> [root@lcg2197 ~]# condor_who
>
> OWNER CLIENT SLOT JOB RUNTIME
> tna62a001@xxxxxxxxxxxxxxx arc-ce03.gridpp.rl.ac.uk 1_17 24276544.0
> 8+02:32:36
> alicesgm@xxxxxxxxxxxxxxx arc-ce03.gridpp.rl.ac.uk 1_18 24266128.0
> 8+06:46:55
>
>
>
> Also
>
> [root@arc-ce03 config.d]# condor_status -constraint '(Opsys == "linux") &&
> (OpSysMajorVer == 7) && (SkaRes == True)'
> Name OpSys Arch State Activity LoadAv Me
>
> slot1@xxxxxxxxxxxxxxxxxxxxxxx LINUX X86_64 Owner Idle 0.060 17
> slot1@xxxxxxxxxxxxxxxxxxxxxxx LINUX X86_64 Owner Idle 0.000 21
> slot1@xxxxxxxxxxxxxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 21
> slot1@xxxxxxxxxxxxxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.540 21
> slot1_17@xxxxxxxxxxxxxxxxxxxxxxx LINUX X86_64 Claimed Busy 0.000
> slot1_18@xxxxxxxxxxxxxxxxxxxxxxx LINUX X86_64 Claimed Busy 0.000
>
> Machines Owner Claimed Unclaimed Matched Preempting Drain
>
> X86_64/LINUX 6 2 2 2 0 0 0
>
> Total 6 2 2 2 0 0 0
>
>
>
> On above output I believe the 4 WNs are correctly advertising themselves as
> available for 'ska' jobs (SkaRes == True)
>
> What it appears I cannot control yet is the Negotiator does not match the Job
> requirements to advertised resources.
>
> So my question is what am I missing and where (it could be on ARC-CE in
> /etc/condor/config.d/ but I do not know what to add there)
>
> Also as an detail, our batch farm (ARC-CE + HTCondor) is running Docker
> containers for each job_slot, not sure if this is the problem here or not.
>
> Many thanks for any help,
> Catalin Condurache
> RAL Tier-1
>
>
>
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/