Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Automatically detecting job completion and filetransfer
- Date: Wed, 12 Apr 2006 15:07:06 +0100
- From: "Jon Blower" <jdb@xxxxxxxxxxxxxxxxxxxx>
- Subject: Re: [Condor-users] Automatically detecting job completion and filetransfer
> Your submit file above isn't using file transfer - are you using NFS?
> NFS caching can cause the weird ordering you describe.
I think I am (I haven't set up the Condor pool myself, I'm using one at my
institution) so it sounds likely that this is the issue, thanks. Can I
force Condor to bypass this and transfer all the files (including stdout and
stderr) to the submit host before marking the job complete?
I understand that I can use "transfer_output_files = file1 file2" but does
this also work for stdout and stderr?
Thanks, Jon
> -----Original Message-----
> From: condor-users-bounces@xxxxxxxxxxx
> [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Erik Paulson
> Sent: 12 April 2006 14:55
> To: Condor-Users Mail List
> Subject: Re: [Condor-users] Automatically detecting job
> completion and filetransfer
>
> On Wed, Apr 12, 2006 at 11:03:49AM +0100, Jon Blower wrote:
> > Dear all,
> >
> > This question has probably been asked before but I haven't
> been able
> > to find an answer on Google or the mailing list archives.
> I'm writing
> > a Java program that submits jobs to a Condor pool. The
> Java program
> > runs on a submit host and generates job description files
> that look like this:
> >
> > executable = /home/jon/bin/helloworld
> > universe = vanilla
> > input = stdin
> > output = stdout
> > error = stderr
> > log = condor.log
> > initialdir = /some/directory
> > Queue
> >
> > I submit the job by calling condor_submit from Java's
> Runtime.exec() method.
> > This bit works fine.
> >
> > My problem is detecting categorically when the job has
> completed *and*
> > the output files (stdout and stderr) have been transferred
> back to the
> > submit host. My first stab at the Java program detects the
> status of
> > the job ("submitted", "running", "complete") by parsing the
> log file
> > that is produced. It also gets the exit code of the
> executable from this log file.
> >
> > To detect job completion, my program looks for the "005" event ("Job
> > terminated") in the log file. However, it seems that this event is
> > sent to the log file *before* the contents of the stdout and stderr
> > files are transferred to the submit host. If I check the length of
> > the stdout and stderr files on the submit host (using the length()
> > method of java.io.File) they both report zero immediately after the
> > "005" event is detected in the log file. If I wait a few
> seconds, the
> > length() method reports the correct length, indicating that these
> > files (or at least their contents) are transferred a few
> seconds after the "005" event.
> >
>
> Your submit file above isn't using file transfer - are you using NFS?
> NFS caching can cause the weird ordering you describe.
>
> -Erik
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>