On 7/16/2024 3:57 PM, Vikrant Aggarwal wrote:
HelloÂExperts,
I have a requirement to limit the number of user jobs on worker nodes. Let's say we have 10 worker nodes with dynamic slots, I want to ensure that user1 can run max 2 jobs on each node not more than that.
Is this a policy coming from the pool administrator, and thus should be a policy of your execution points (startd) ?Â
Or do you want this to be a policy coming from the user submitting a job, ie when "user1" submits a job, they want to ensure no more than 2 jobs on each node?
One way to do this is via the classad "countMatches()" function. See
ÂÂ https://htcondor.readthedocs.io/en/latest/classads/classad-mechanism.html#predefined-functions
This combined with the fact that the partitionable slot contains an attribute "ChildRemoteUser" which is a list of users using each dynamic slot lets you do what you want.
It is a little tricky / messy, but here is an example submit file that requires a matching a node where less than 2 jobs from that user are already there:
Âexecutable = foo.exe
Ârequirements = \
ÂÂÂÂÂÂÂÂ PartitionableSlot && \
ÂÂÂÂÂÂÂÂ countMatches(x==User,eval(strcat("{ [x=\"",join("\";], [x=\"",ChildRemoteUser),"\"] }"))) < 2
 queue
You could use the requirements _expression_ from the above job submit file as the START _expression_ in your condor_config file if you wanted the policy enforced at the EP.
The good news is you can do an awful lot with the power of ClassAds, the bad news is it can get convoluted.
Hope this helps,
Todd
-- Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison Center for High Throughput Computing Department of Computer Sciences Calendar: https://tinyurl.com/yd55mtgd 1210 W. Dayton St. Rm #4257 Phone: (608) 263-7132 Madison, WI 53706-1685