Hello,
I am using the parameter estimation software PEST to run multiple models
(jobs). PEST uses YAMR and Panther, although I struggle to make sense of
how everything works together.
The parameters are determined from a probability distribution. Some
parameter combinations (jobs) can take 12+ hours to run, and from previous
experience I can tell the results of those runs will be worthless to me. I can usually
tell which jobs will be useless within the first hour. I would like to
remove these jobs after about 1 hours to free up cores for other runs but have them returned as failed jobs and not be resubmitted to the pool.Â
For example, if I type "condor_rm 1.10" it will
remove that job, but the model with those parameters will just be
resubmitted to another node and start over. However, if the job truly fails a job with
those parameters will not be resubmitted.
Is there a way to remove a job and have condor return a failed status,
rather than have the same parameters run under a different job name?
References: