Hello, After an extensive web-search, I do not seem to find an answer to a simple question:
how do I forbit HTCondor to restart my jobs? I have a type of jobs, which I used to ran as independent jobs and they were always allowed to finish by HTCondor. I have upgraded the process to be more efficient in theory by running those jobs as a DAG, which consists of multiple (hundreds)
of independent graphs (i.e. no parent/child links between them). And now, HTCondor does not allow the jobs to finish since its keeps restarting (after about an hour of running) them before they could complete (NumJobStarts keeps incrementing and the run time
of a job as seen in Linux top keeps being reset to zero). How can I tell HTCondor that it is forbidden to restart jobs and all the jobs should be allowed to finish no matter how long it takes? What could be the reason the jobs started to restart execution periodically when run as part of a DAG? I am the administrator of my HTCondor cluster, so I am sure that nether HTCondor configuration parameters were changed, nor the individual job submit files were changed. Thank you very much for your help, Siarhei. ............................................................................ Trading instructions sent electronically to Bernstein shall not be deemed For further important information about AllianceBernstein please click here |