Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[HTCondor-users] HTCondor-CE: Setting Default limits
- Date: Tue, 31 Aug 2021 18:08:38 +0200
- From: Stefano Dal Pra <stefano.dalpra@xxxxxxxxxxxx>
- Subject: [HTCondor-users] HTCondor-CE: Setting Default limits
Hello,
i'm working to configure a htcondor-ce 5.1 and have a few doubts on
how to properly set default job limits.
I'm following the examples from here:
https://htcondor.github.io/htcondor-ce/v5/configuration/writing-job-routes/
such as this one:
JOB_ROUTER_ROUTE_Condor_Pool @=jrt
UNIVERSE VANILLA
# Set the requested memory to 1 GB
default_maxMemory = 1000
@jrt
JOB_ROUTER_ROUTE_NAMES = Condor_Pool
Q1: Is it possible to set default_maxMemory to a value proportional
to RequestCpus of the incoming job? i.e.
something like
default_maxMemory = $(RequestCpus:1) * 3000
Q2: I applied the following defaults:
JOB_ROUTER_ROUTE_t1_defaults @=jrt
ÂÂ UNIVERSE VANILLA
ÂÂ default_xcount = 4
ÂÂ default_maxMemory = 4321
ÂÂ default_maxWallTime = 61
@jrt
ÂBut I'm a bit confused with the overall results:
0) I submit a minimal test job:
[sdalpra@ui-htc htjobs]$ condor_submit -pool
ce01t-htc.cr.cnaf.infn.it:9619 -remote ce01t-htc.cr.cnaf.infn.it
ce_testp308.sub
Submitting job(s).
1 job(s) submitted to cluster 610.
1) The job is routed
[root@ce01t-htc ~]# condor_ce_q 610. -af routedtojobid
8428.0
2) I check classads from the routed job
[root@ce01t-htc ~]# condor_q 8428.0 -af:jln jobstatus
CpusProvisioned xcount requestcpus OriginalCpus remote_NodeNumber
remote_SMPGranularity BatchRuntime OriginalMemory
remote_OriginalMemory OriginalCpus remote_NodeNumber
remote_SMPGranularity
ID = 8428.0
Âjobstatus = 2
ÂCpusProvisioned = 1
Âxcount = undefined
Ârequestcpus = 1
ÂOriginalCpus = 4
Âremote_NodeNumber = 4
Âremote_SMPGranularity = 4
ÂBatchRuntime = 3660
ÂOriginalMemory = 4321
Âremote_OriginalMemory = 4321
ÂOriginalCpus = 4
Âremote_NodeNumber = 4
Âremote_SMPGranularity = 4
So this is where i'm puzzled:
- I would expect to see xcount = 4 but it is undefined instead.
- The running job reports CpusProvisioned = 1, and that makes me
think that
remote_NodeNumber = 4, remote_SMPGranularity = 4, OriginalCpus = 4
are somehow ignored.
- BatchRuntime is there, with the proper value set as expected (61 *
60) however i'm not sure on the meaning.
The htcondor manual says: << For batch grid
universe jobs, a limit in seconds on the jobâs
execution time, enforced by the remote batch system.>> who is
"remote" in this context? Does that mean that condor-ce would stop
the running routed job after 61 minutes? Moreover,
we have here a Vanilla universe job, at both CE and batch side:
[root@ce01t-htc ~]# condor_ce_q 610. -l | grep -i univer
JobUniverse = 5
[root@ce01t-htc ~]# condor_q -l 8428.0 | grep -i univer
JobUniverse = 5
Remote_JobUniverse = 5
Thanks for any comment
Stefano