On Jul 22, 2005, at 5:26 AM, Andreas Vetter wrote:
When an execute machine kills a job for running too long, the schedd doesn't consider the job complete. It thinks that the execute machine wasn't willing to let the job run long enough and it now needs to find another machine that will let the job run to completion. When a job leaves the queue is controlled by the job ad in the schedd. If you want your jobs to leave the queue when they run longer than 12 hours, you need to set periodic_remove in the job ads. If you want the jobs to stay in the queue but not get rerun, you need to modify the startd's requirements to not run jobs that previously ran for more than 12 hours. +----------------------------------+---------------------------------+ | Jaime Frey | Public Split on Whether | | jfrey@xxxxxxxxxxx | Bush Is a Divider | | http://www.cs.wisc.edu/~jfrey/ | -- CNN Scrolling Banner | +----------------------------------+---------------------------------+ |