Hi Gagan,In /var/log/condor on the execution point, the StarterLog for the slot that the job ran on might have some more information on what happened, particularly if it had something to do with condor. If the error is not condor related, one thing you can try when submitting the job, particularly if you have reason to expect it to fail, is to set stream_ouptut and stream_error in your submit file to stream the job's stdout and stderr back to the access point, though you will probably want to limit this to only a couple jobs as it can stress the network and disk of the access point: https://htcondor.readthedocs.io/en/latest/man-pages/condor_submit.html#stream_errorJasonOn Fri, Feb 28, 2025 at 8:49âAM gagan tiwari <gagan.tiwari@xxxxxxxxxxxxxxxxxx> wrote:Hi Guys,_______________________________________________           At times , jobs running on exec nodes crashed withÂSignal 9 error. But this is a generic message and we don't know exactly what went wrong with the jobs.ÂÂIs thereÂany settingÂin condor whichÂcan be tweaked to provide detailed information about what exactly happened?Thanks,Gagan
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
Join us in June at Throughput Computing 25: https://urldefense.com/v3/__https://osg-htc.org/htc25__;!!Mak6IKo!PFiVWScAJOMtq2okmjp_-ajHBiC5_bzkouhme7wn3DIKu44q0KeT0PQ9Jy8VgsyMoRma3jmG5X06Rw-oaLOUOrDn42hVSXwH$
The archives can be found at: https://www-auth.cs.wisc.edu/lists/htcondor-users/