[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Job marked as completed however output files not transferred back



Hello Experts,

We are facing a strange situation where the job is marked as completed (C) in condor job history but the job has not transferred back the output files from worker to submit. Usually in this case the job should go into a held state.Â

TransferOutput = "job.93.result,job.93.dir,coverage.93.zip"
ShouldTransferFiles = "YES"
WhenToTransferOutput = "ON_EXIT"


In this worker nodes are running in cloud and they are on-spot VM(s) which triggers the following script on shutdown of the node.Â

#!/bin/bash
/usr/sbin/condor_off -daemon master

Cloud logs showing that VM (Virtual machine) got preempted approx during the same time when job is marked as completed.Â
Â
Hypothesis:Â

Edge scenario where job finished computation but during the phase of transfer worker node got preempted because of ON_EXIT condition job marked as completed.

Could above assessment be right here?

ÂIf suggestion is to useÂON_EXIT_OR_EVICTÂthen we can't use it as it may fill up the spool very quickly at the scale.Â

Any other suggestion to ensure that job is not marked as completed with missing output?Â


Thanks & Regards,
Vikrant Aggarwal