[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] Complex license handling
- Date: Wed, 30 Aug 2006 17:58:05 +0200
- From: "François Bachmann" <f.bachmann@xxxxxxxxx>
- Subject: [Condor-users] Complex license handling
We have a truckload of PVM jobs we'd like to submit to our Condor cluster, and which will use a license server to determine whether they can run at the time given (with a finite number n of licences).
Only some of our machines are suitable to be PVM master nodes, so we have a custom machine attribute for this (PVM_Master = True).
Nothing dramatic until here, but we want to make sure two jobs don't get sent to the same machine at the same time (to avoid PVM I/O hell), so we introduced a LoadAvg type constraint ("submit it to PVM_Master machines with LoadAvg <
0.3 only"). Not the most elegant, but it worked for a while...
Trouble is now, job load varies greatly and can almost zero out for minutes (between calculation iterations, just file I/O happening). We'd like to make sure that the Condor Negotiator doesn't start thinking "there's a PVM_Master machine with a low LoadAvg, let's give him a new PVM job". How can we achieve that?
I've looked into:
* alternative ways of calculating LoadAvg (over longer periods of time - not sure this can be done within Condor)
* using group quotas (something like a PVM group with n machines for our n licences)
* using Master/Worker (not sure how)
* wrapping the job into a DAGMan (a bit clumsy, IMHO)
Any wisdom from the crowd? I'd be happy to provide more info if needed.
Thanks in advance
François