Subject: Re: [HTCondor-users] how to use rank with memory
On 6/24/2021 8:42 PM, Myunggi Yi wrote:
Dear users,
I have installed HTCondor 9.0.1
I want the jobs to run on more memory machines.
I submitted a job with the following script, but the job
always goes to node07, which has less memory.
How can I achieve my goal?
Hi,
A couple of thoughts:
1. I assume the node with more memory was available (unclaimed) at
the time you submitted your job? Realize that setting up a Rank in
your submit file only sorts amongst resources that are currently
available (unless you tell HTCondor to allow preemption of jobs
based on rank - you can do this, but in practice very few sites
would want to). Rank is really only helpful on pools that are
lightly loaded. On busy pools where most of the nodes are busy
doing something all of the time, Rank becomes pretty useless without
preemption, because at any given moment there may only be one or two
free slots to pick from. In these scenarios, you really want to use
Requirements instead of Rank.
2. The administrator of your central manager gets to take first stab
at ranking slots before the job does. Basically, it is a
multi-level sort where matching slots are first sorted by config
knob NEGOTIATOR_PRE_JOB_RANK, then sub-sorted (i.e. ties from the
first sort are handled) by the job's Rank _expression_, and then
sub-sorted by config knob NEGOTIATOR_POST_JOB_RANK (details in the
Manual at https://tinyurl.com/yzcr5ocq). Suggest you login to your
central manager and enter the command:
condor_config_val -v
NEGOTIATOR_PRE_JOB_RANK
On my machine, this returns the following:
NEGOTIATOR_PRE_JOB_RANK = (10000000 *
My.Rank) + (1000000 * (RemoteOwner =?= UNDEFINED)) - (100000 *
Cpus) - Memory
# at: <Default>
Note that the final clause is referencing CPUs and Memory, in
an attempt to do depth-first allocation of nodes. On your pool,
perhaps you want to get rid of this behavior by setting the
following in the config of your central manager:
NEGOTIATOR_PRE_JOB_RANK = (10000000 * My.Rank) + (1000000 *
(RemoteOwner =?= UNDEFINED))
and then doing a condor_reconfig as usual.