[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Recover the job status in the PostScript




On 7/17/25 8:15 AM, Andres Ayala wrote:

And I have found an issue with the âPostScriptâ (or maybe is because I was using incorrectly it from the beginning)

I am running the jobs in the docker universe, and I have a PostScript to perform some actions after the container has finished.
In this script I need to know the exit status of the container (signal or exit code) and up to now I was able to get that information directly at the working directory with a code like:


Hi Andres:

As you've discovered, the ToE ads have been removed in HTCondor 24. There is, an easier way to get the exit status (including whether it exited because of a signal) in the dag post script: When you declare the post script for the node, if the command line arguments to the post script include the string $RETURN, dagman expands that to the return value of the job, e.g.

SCRIPT POST A post.sh job_status $RETURN

when the dag node "A" completes, the script named "post.sh" will run with the arguments of job_status 0, if the job exited 0, or job status -9, if it was killed by signal 9.

Let us know if this solves the problem, it should be easier than parsing a classad file.

We do have the European HTCondor workshop coming up this fall in Prague -- I do know that researchers from EUMETSTAT have attended in the past, perhaps you'd be able to attend this year, and we can talk more in person?

-greg






Import classad2 as classad

job_ad = os.environ.get('_CONDOR_JOB_AD')
with open(job_ad, 'r') as ft:
     ca = classad.parseOne(ft)

toe = ca[âToEâ]
if toe[âExitBySignalâ]:
     signal = toe[âExitSignalâ]
else:
     code = toe[âExitCodeâ]


Now ToE is not any longer in the job ads. In the submitter is not an issue, as I can find the same information in TerminatedNormally, ReturnValue and TerminatedBySignal ads.
The problem is at the PostScript, that now at the _CONDOR_JOB_AD I only find:
     ExitBySignal = false
     ExitStatus = 0

Independently on how the docker container has finished. It looks like the file is created before the job starts and not updated anymore after the job ends.

Is the .job.ad file not going to be updated anymore after the job execution?
Is there a better way to recover the exit status of the container (or the job in case of vanilla universe) in the PostScript?

Thanks!
AndrÃs Ayala

EUMETSAT Data Processing System Engineer

Any email message from EUMETSAT is sent in good faith but shall neither be binding nor construed as constituting a commitment by EUMETSAT, except where provided for in a written agreement or contract or if explicitly stated in the email. Please note that any views or opinions presented in this email are solely those of the sender and do not necessarily represent those of EUMETSAT. This message and any attachments are intended for the sole use of the addressee(s) and may contain confidential and privileged information. Any unauthorised use, disclosure, dissemination or distribution (in whole or in part) of its contents is not permitted. If you received this message in error, please notify the sender and delete it from your system.

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe

The archives can be found at: https://www-auth.cs.wisc.edu/lists/htcondor-users/