Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] return code of jupyter notebook jobs

Date: Fri, 27 Mar 2020 11:05:13 -0500
From: Todd Tannenbaum <tannenba@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] return code of jupyter notebook jobs

On 3/27/2020 5:25 AM, Beyer, Christoph wrote:

Hi,

as we use jupyter notebooks running in condor slots in production for a while now we need to get a bit of monitoring around this.

One of the bigger problems to come up with something decent is that the jupyterhub uses condor_rm to end the notebook once it is not needed anymore. This results in a condor_history entry with jobstatus == 3 which is considered to be a faulted job (which in fact in this case it is not). The other option is that the notebook job runs into the timelimit and gets removed by the periodic_remove_expression which is a bit more flexible to tweak presumably.

I would like the idea of having an option for condor_rm to influence the subsequent history-job-state.

I think your idea, whereby condor_rm can influence subsequent history-job-state, is on target. Please note thatcondor_rm takes a "-reason <string>" argument, which allows you to set the RemoveReason job attribute at the time ofremoval. This RemoveReason attribute will also be in the history. The Python API also supports setting a removal reasonat the time of job removal.


Does this help?

best regards,
Todd

References:
- [HTCondor-users] return code of jupyter notebook jobs
  - From: Beyer, Christoph

Prev by Date: Re: [HTCondor-users] HTCondor and COVID-19
Next by Date: Re: [HTCondor-users] HTCondor and COVID-19
Previous by thread: [HTCondor-users] return code of jupyter notebook jobs
Next by thread: [HTCondor-users] Error: Could not fetch startd ads
Index(es):
- Date
- Thread

Mailing List Archives

Authenticated access

Re: [HTCondor-users] return code of jupyter notebook jobs