_______________________________________________On 12/30/24 15:29, Thomas Madureira wrote:
Hi All,
We're having a difficult time finding a way to prevent what appears to be an infinite retry loop when a condor_shadow process runs OOM.
e.g.Here we created a simple test script that will allocate memoryÂ> requested memory
The exception is viewed in logs,007 (3738904.000.000) 2024-12-27 17:09:28 Shadow exception!
    Error fromÂslot1_1@xxxxxxxxxxxxxxxxxxxxxxx: Worker node is out of memory
Hi Thomas:
There have been several fixes in this area in 23.0.19, but what do you want to happen in this case? To put the job on hold, so the user must itervene before trying again?
-greg
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
The archives can be found at: https://www-auth.cs.wisc.edu/lists/htcondor-users/