Hi Vikrant,
just a quick hack - but maybe you can check/debug with a postcmd script
the status of the output file transfer status?
Cheers,
 ÂThomas
On 16/02/2023 08.00, Vikrant Aggarwal wrote:
> Hello Experts,
>
> We are facing a strange situation where the job is marked as completed
> (C) in condor job history but the job has not transferred back the
> output files from worker to submit. Usually in this case the job should
> go into a held state.
>
> TransferOutput = "job.93.result,job.93.dir,coverage.93.zip"
> ShouldTransferFiles = "YES"
> WhenToTransferOutput = "ON_EXIT"
>
> In this worker nodes are running in cloud and they are on-spot VM(s)
> which triggers the following script on shutdown of the node.
>
> #!/bin/bash
> /usr/sbin/condor_off -daemon master
>
>
> Cloud logs showing that VM (Virtual machine) got preempted approx during
> the same time when job is marked as completed.
> Hypothesis:
>
> Edge scenario where job finished computation but during the phase of
> transfer worker node got preempted because of ON_EXIT condition job
> marked as completed.
>
> Could above assessment be right here?
>
>Â ÂIf suggestion is to use ON_EXIT_OR_EVICTÂthen we can't use it as it
> may fill up the spool very quickly at the scale.
>
> Any other suggestion to ensure that job is not marked as completed with
> missing output?
>
>
> Thanks & Regards,
> Vikrant Aggarwal
>
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/