Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] Possible to have submit-implemented per-machine job limits?
- Date: Mon, 12 Sep 2016 13:21:34 -0500
- From: Todd Tannenbaum <tannenba@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] Possible to have submit-implemented per-machine job limits?
On 9/12/2016 12:31 PM, Michael V Pelletier wrote:
Hi folks,
We have a situation where a certain type of job has an adjunct service
process which can only have on instance on a given machine, since it uses
a static port number to provide its service to the job. It can't easily be
reworked since it's designed to operate that way in a production
environment. This means that one physical machine can only run one
instance of that job.
Maybe you could run multiple instances of this adjunct job on one
physical host by using a job universe that virtualizes the network
environment (i.e. docker universe? vm universe?) ?
I know I can set up a machine resource in the configuration for this
purpose, assigning one "myservice" resource to each machine, and this
would allow the job to specify "request_myservice = 1" and thus limit to
one job per machine.
What I'm wondering is if it's possible to use something in the job's
requirements expression alone to accomplish this, rather than a
server-side config customization. I'm using partitionable slots - I
suspect that fact may make this a tricky problem to solve without startd
configuration changes, because the partitionable slot would probably need
information about what the dynamic slots are doing.
Doing what you want via setting up a custom machine resource (i.e.
request_port777 = 1) is exactly what I'd suggest; scenarios like the
above are why custom machine resources exist, since this really is a
custom machine resource. For instance, what if two different users both
have an app that requires the same static slot?
But given that you cannot configure the execute nodes, perhaps your job
requirements could look at the ChildRemoteUser attribute in the
partitionable slot? This attribute is a classad list of all the owners
of dynamic slots on the machine. You could probably leverage this so
only one job submitted by you runs on each machine...
regards
Todd
One similar thing I've done in the past was to steer jobs which could
share a license checkout on the same machine by making a "condor_q" query
from the script and turning it into a rank expression to favor machines
already running that user's licensed jobs, but that requires, needless to
say, a submit wrapper script which I'd like to avoid.
I've also used SubmitterUserResourcesInUse, but that applies to the entire
pool rather than to a single machine.
Maybe there's some sort of trick in the new 8.4 submit syntax that could
be applied here?
Thanks for any suggestions you can offer!
-Michael Pelletier.
_
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
--
Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
Center for High Throughput Computing Department of Computer Sciences
HTCondor Technical Lead 1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132 Madison, WI 53706-1685