I'm having some trouble when running multiple jobs on HTCondor. My only guess is that in some random moment a transferred file is corrupted the the transmission procedure.
What I got:
Several .tar.gz files (datafiles). Let's say, one of those files is pack.tar.gz
Several tests (1, 2, 3, ...12) that uses pack.tar.gz.
pack.tar.gz is a valid file (it can be uncompressed at submission node).
from the 12 tests, 11 works. One test (random), I got the following error:
gzip: stdin: unexpected end of file
tar: Child returned status 1
tar: Error is not recoverable: exiting now
The testing processing is the same (just changing some parameters on the following steps).
The only thing that I can imagine is that the file transfer at some point fails (maybe a network issue?).
Is there a way to solve this problem?
Thanks
Roberto