HTCondor Project List Archives



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-devel] [condor-fw] [nwp@xxxxxxxxxxxxxxxxxxxxxx: errorlog]



On Wed, May 16, 2012 at 01:06:47PM -0500, John (TJ) Knoeller wrote:
> Is it possible that the file size is 0, because the file has been
> written to but not (yet) closed?
> -tj

I wondered about that, but the job completed at 1800 last night , and I
mailed my query at 1200 today, so I rejected that in the belief a shadow
would not hang out for 18 hours trying to write a file.
> 
> 
> On 5/16/2012 12:29 PM, Todd Tannenbaum wrote:
> >On 5/16/2012 12:09 PM, Nathan Panike wrote:
> >>>Couple thoughts:
> >>>
> >>>1. fix your job?  what is creating the file somarun.1968334.0.xml?
> >>>is there any way for your job to create this file, fail to write
> >>>anything into it, and still exit with status 0 ?  me thinks there
> >>>is, and that is what is happening.
> >>
> >>The job produces output on every run when I run it by hand. Your
> >>hypothesis runs into the line above:
> >>
> >>    2923130  -  Total Bytes Sent By Job
> >>
> >>This indicates that the shadow believes output was sent back, and yet
> >>the file is empty.
> >>
> >
> >True, my hypothesis was formulated in an absence of information /
> >background... for all I know your job is producing 200 output
> >files and the Total Bytes Sent is reflecting files other the one
> >empty file.
> >
> >Another thought --  iirc, your submitted your job to transfer on
> >exit or evict... is your job truly prepared to resume with
> >half-filled output files?  Maybe you want your job to only
> >transfer on exit....
> >
> >Yet another (depressing) thought... maybe we have a serious
> >regression in the Condor file transfer code.  :(
> >
> >Todd
> >
> >p.s. we probably should've had this discussion on
> >condor-developer, nothing really UW FW team specific about it...
> >
> >>>
> >>>or
> >>>
> >>>2. If (1) is difficult, consider submiting with DAGMan and have the
> >>>post script validate the output files.
> >>>
> >>That is probably what I will have to do.
> >
> >