[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Limit the number of user jobs a node



On 7/16/2024 3:57 PM, Vikrant Aggarwal wrote:
Hello Experts,

I have a requirement to limit the number of user jobs on worker nodes. Let's say we have 10 worker nodes with dynamic slots, I want to ensure that user1 can run max 2 jobs on each node not more than that.

Is this a policy coming from the pool administrator, and thus should be a policy of your execution points (startd) ? 

Or do you want this to be a policy coming from the user submitting a job, ie when "user1" submits a job, they want to ensure no more than 2 jobs on each node?

One way to do this is via the classad "countMatches()"  function.  See
   https://htcondor.readthedocs.io/en/latest/classads/classad-mechanism.html#predefined-functions
This combined with the fact that the partitionable slot contains an attribute "ChildRemoteUser" which is a list of users using each dynamic slot lets you do what you want.

It is a little tricky / messy, but here is an example submit file that requires a matching a node where less than 2 jobs from that user are already there:

 executable = foo.exe
 requirements = \
         PartitionableSlot && \
         countMatches(x==User,eval(strcat("{ [x=\"",join("\";], [x=\"",ChildRemoteUser),"\"] }"))) < 2

  queue

You could use the requirements _expression_ from the above job submit file as the START _expression_ in your condor_config file if you wanted the policy enforced at the EP.
The good news is you can do an awful lot with the power of ClassAds, the bad news is it can get convoluted.

Hope this helps,
Todd

-- 
Todd Tannenbaum <tannenba@xxxxxxxxxxx>  University of Wisconsin-Madison
Center for High Throughput Computing    Department of Computer Sciences
Calendar: https://tinyurl.com/yd55mtgd  1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132                   Madison, WI 53706-1685