Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] MAX_SHADOW_EXCEPTIONS

Date: Mon, 6 Jan 2025 11:24:31 -0600
From: Greg Thain <gthain@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] MAX_SHADOW_EXCEPTIONS

On 12/30/24 15:29, Thomas Madureira wrote:

Hi All,
We're having a difficult time finding a way to prevent what appears to be an infinite retry loop when a condor_shadow process runs OOM.

e.g.

Here we created a simple test script that will allocate memory > requested memory

The exception is viewed in logs,

007 (3738904.000.000) 2024-12-27 17:09:28 Shadow exception!
Error from slot1_1@xxxxxxxxxxxxxxxxxxxxxxx: Worker node is out of memory

Hi Thomas:

There have been several fixes in this area in 23.0.19, but what do you want to happen in this case? To put the job on hold, so the user must itervene before trying again?

-greg

Follow-Ups:
- Re: [HTCondor-users] MAX_SHADOW_EXCEPTIONS
  - From: Thomas Madureira

Prev by Date: Re: [HTCondor-users] condor_negotiator 24.0.2 sigsegv with offline EPs
Next by Date: Re: [HTCondor-users] [EXTERNAL] HTCondor-users Digest, Vol 133, Issue 43
Previous by thread: Re: [HTCondor-users] Community Query of DAGMan Functionality
Next by thread: Re: [HTCondor-users] MAX_SHADOW_EXCEPTIONS
Index(es):
- Date
- Thread

Mailing List Archives

Authenticated access

Re: [HTCondor-users] MAX_SHADOW_EXCEPTIONS