I am running on the OSG. I have a limited allocation on Comet which use to run glideins, each of which provides 24 slots on which only my jobs run. I set parameter MINS_UNTIL_RETIREMENT to be 15 minutes less than the time the glidein runs. I had been using 10 as that is a more efficient number given the relatively short run time of my jobs. For the 3rd time, I am seeing my glideins dying because they do not get assigned jobs within 15 minutes of startup.
Is there a way to signal the scheduler that there are idle cores available which are reserved for a particular user? Or is there a reasonable way to implement it? And is there some other parameter that I’m missing that will keep my glideins alive till they are assigned jobs. Best - Don |