[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] hold/released jobs remain on idle state



On Feb 21, 2020, at 7:40 AM, Alejandro AcuÃa <alejandro.acunia@xxxxxxxxxxxxxxxx> wrote:

Hi friends.
Is there a command to force classAd validation and subsequent runnning state on jobs in idle state?
 
We are running several jobs on CERN Batch Service. Yesterday many of these jobs were succesfully done but others were placed in hold state because our disk quota was overflow. We noticed this and release free space for the held jobs to continue working. Finally we executed condor_release command on these hold jobs but they have stayed in idle state...forever. Why? 
Detail: our submit has "JobFlavour = Workday" line. Could this be?
 
I think that perform condor_release of a held job restart the process instead of continue them. And the core assign is completely different than first submit. Is this correct?

Once your held jobs return to Idle status after a condor_release, Condor will attempt to find machines to run them, along with all other idle jobs. Unless the jobs have explicit self-checkpointing logic, they will restart execution from the beginning once they are matched to a machine.

condor_q -analyze is a good tool to figure out why a job isnât being matched to an execute machine.

Thanks and regards,
Jaime Frey
UW-Madison HTCondor Project