the issue does not appear, and the job disappear from the queue
after a successful transfer of the output to the submit node.
I tried for some submissions to access the worker node on which the
job is still running (while the output has been already transferred
to the remote storage) via condor_ssh_to_job.
I didn't get a consistent behavior, getting for some trials the
message:
I found the following error message appearing at a constant rate in
the ShadowLog of the job, well after the ouput has been retrieved on
the remote storage:
ERROR "Error from slot1_1@gridka-2ed723aef7@c01-011-108.gridka.de:
Repeated attempts to transfer output failed for unknown reasons"
at line 585 in file
/tmp/__build/build-3k7WTP/BUILD/condor-23.5.0/src/condor_shadow.V6.1/pseudo_ops.cpp
This message is pointing to the exception:
l584: //lame: at the time of this writing, EXCEPT does not want
const:ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ
Â
l585: EXCEPT("%s", critical_error);
Could you please give me some hint about where to look more deeply
in order to solve this issue?
Thanks a lot in advance for your help!