On Wed, Mar 27, 2013 at 10:39 AM, Max Fischer
<mfischer@xxxxxxxxxxxxxxxxxxxx> wrote:
Hi,
I have a problem with HTCondor's transfer_output_files and hold mechanic with failing jobs. We often have some very big temporary files in our jobs, so specifying the actual output files is a must for us in these cases. However, when a job fails before creating the output file, the STARTER subsequently fails to transfer the "requested" output file and as a result the job is put on hold. [1] I realize this is a documented behavior. [2]
As we have a multi-backend job manager wrapped around Condor, this is not exactly optimal for us. It masks what is an error in the job itself as an error of Condor. We could catch the error but this requires separate handling for jobs with transfer_output and those without it. It would be much easier if we could have Condor treat the job as Completed regardless of whether all files exist or define individual files as optional. Is there any way to do this?
Cheers,
Max
[1]
8928.89 mfischer 3/27 14:49 Error from [WORKERNODE]: STARTER at [WORKERNODE] failed to send file(s) to <[SCHEDD]:9615>: error reading from /home/cmsusr189/home_cream_084391945/glide_p21124/execute/dir_20515/cmssw.log.gz: (errno 2) No such file or directory; SHADOW failed to receive file(s) from <[WORKERNODE]:52719>
[2]
http://research.cs.wisc.edu/htcondor/manual/current/2_5Submitting_Job.html#SECTION00354200000000000000
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@cs.wisc.edu with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/