On 5/11/2013 1:10 PM, Russ Poyner wrote:
We have a pool running 7.4.4 with dynamic/partitionable slots. A user is wanting to run jobs with request_cpus = 8 and I'm concerned that our machines will be too fragmented to accept the jobs. Clearly the correct solution is to upgrade to a current condor version with condor_defrag. That's scheduled for June, after the looming research deadline. Meanwhile I wonder if there is a way to simulate defrag type behavior, by hand if needed on a 7.4.4 pool.
Perhaps something as simple as doing a : condor_restart -peaceful [hostname]on some number of execute machines periodically ? IIRC, even in v7.4.x telling an execute node to restart w/ the peaceful option will result in that node refusing to accept new jobs but allowing currently running jobs to complete, and only once all jobs are completed will the startd exit (at which point the master will restart everything).
Todd