Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Capturing the signal from worker nodes when job breaches memory

Date: Fri, 6 Oct 2023 17:17:12 -0500
From: Greg Thain <gthain@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] Capturing the signal from worker nodes when job breaches memory

On 10/5/23 10:03, Vikrant Aggarwal wrote:

Hello Experts,
We want to capture the signal to copy some logs before the scratchdirectory disappears after the job goes into hold status because ofmemory breach but we are unsuccessfulÂto do it. Do we have any way toachieve this? We thought it was probably a job wrapper which is doingexec to run actual condor jobs not allowing us to capture the signalbut that's not the case.

The Linux out-of-memory signal uses signal 9, which is uncatchable.Â Youcould write a startd policy which evicts jobs when their MemoryUsage issome percentage of the total, and if the job has


when_to_transfer_output = ON_EXIT_OR_EVICT

then the scratch directory would get copied back to the spool on the AP

-greg

Follow-Ups:
- Re: [HTCondor-users] Capturing the signal from worker nodes when job breaches memory
  - From: Vikrant Aggarwal
- Re: [HTCondor-users] Capturing the signal from worker nodes when job breaches memory
  - From: Weatherby,Gerard

References:
- [HTCondor-users] Capturing the signal from worker nodes when job breaches memory
  - From: Vikrant Aggarwal

Prev by Date: Re: [HTCondor-users] Condor upgrade from 9.0x to 10.0x
Next by Date: Re: [HTCondor-users] Capturing the signal from worker nodes when job breaches memory
Previous by thread: [HTCondor-users] Capturing the signal from worker nodes when job breaches memory
Next by thread: Re: [HTCondor-users] Capturing the signal from worker nodes when job breaches memory
Index(es):
- Date
- Thread

Mailing List Archives

Authenticated access

Re: [HTCondor-users] Capturing the signal from worker nodes when job breaches memory