On 7/16/2024 3:57 PM, Vikrant Aggarwal
wrote:
HelloÂExperts,
I have a requirement to limit the number of user jobs on
worker nodes. Let's say we have 10 worker nodes with dynamic
slots, I want to ensure that user1 can run max 2 jobs on each
node not more than that.
Is this a policy coming from the pool administrator, and thus should
be a policy of your execution points (startd) ?Â
Or do you want this to be a policy coming from the user submitting a
job, ie when "user1" submits a job, they want to ensure no more than
2 jobs on each node?
One way to do this is via the classad "countMatches()"Â function.Â
See
ÂÂ
https://htcondor.readthedocs.io/en/latest/classads/classad-mechanism.html#predefined-functions
This combined with the fact that the partitionable slot contains an
attribute "ChildRemoteUser" which is a list of users using each
dynamic slot lets you do what you want.
It is a little tricky / messy, but here is an example submit file
that requires a matching a node where less than 2 jobs from that
user are already there:
Âexecutable = foo.exe
Ârequirements = \
ÂÂÂÂÂÂÂÂ PartitionableSlot && \
ÂÂÂÂÂÂÂÂ countMatches(x==User,eval(strcat("{ [x=\"",join("\";],
[x=\"",ChildRemoteUser),"\"] }"))) < 2
Â
queue
You could use the requirements _expression_ from the above job submit
file as the START _expression_ in your condor_config file if you
wanted the policy enforced at the EP.
The good news is you can do an awful lot with the power of ClassAds,
the bad news is it can get convoluted.
Hope this helps,
Todd
--
Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
Center for High Throughput Computing Department of Computer Sciences
Calendar: https://tinyurl.com/yd55mtgd 1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132 Madison, WI 53706-1685