[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[HTCondor-users] rarely - file not transferred
- Date: Wed, 20 Feb 2013 14:27:29 -0600
- From: brad.32@xxxxxxxxxxx
- Subject: [HTCondor-users] rarely - file not transferred
An announcement for HTCondor 7.9.4 just came out, which got me
thinking about upgrading from 7.8.4 to perhaps fix an issue.
My use of HTCondor results in each job producing a text file with
summary information and an optional data file. The text file is used
as a trigger to look for the associated data file, because the data
file is not always produced. Submittals might contain thousands of
jobs and each produces these output(s). Occasionally (4 times in ~4
months) at least one of those text files was not transferred at job
completion and the program that manages the HTCondor submits (a task
manager on the submit machine) waited for the text file to show up,
but it never did. After the first time this happened, matching data
files were transferred to the submit machine. Every time this has
happened, condor_status eventually showed that all jobs completed, but
yet the task manager waited. The first time this happened there were a
number of missing text files (I didn't count, 10-20) and I don't know
if any data files were missing.
Because the data output file is optional, if it fails to transfer, the
task manager would not know. It didn't occur to me that this could be
a problem. A workaround would be to put a flag in the text file
indicating that a data file was produced to be sure to look for it,
but the part that bothers me is that not all the files are making it
back to the task manager.
Is this a network reliability issue? Is this something that other
users have seen? Has it been addressed in a patch since version 7.8.4?