Thanks for the info. I might check out some of the options you propose. We've found in large facility queue systems such as visual effects that globally monitored resources are very important. In simple setting s a user can control their own impact on the queue, but in large facilities that becomes much more difficult to predict. In fact I'd say it's not even the size of the facility but rather the throughput of a facility vs. the number of users. The higher the ratio the more difficult it is to manually manage global resources. In many places they simply assign a limited number of cpus per artist to run on, but this seems to me to surcomvent automation of the queue. Alfred from Pixar, although not the best queue software, has a 'ping' system which allows one to run any command before a job is started, very easy to implement, and very effective for global management.In a future release the plan appears to be that local disk space is to be handled by the starter enforcing constraints and killing the job if it violates them. remote disk space is the purview of the quota system of the filesystem...
There seems to be code relating to sophisticated network management built into condor but not enabled yet - I don't know if this is something not ready for prime time...
Condor is very much about the individual users having a reasonable
awareness of the impact of their jobs on the wider world and
throttling as they see fit.
Thanks, j