Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Assertion ERROR on (result)" at line 655 in file pseudo_ops.cpp
- Date: Fri, 04 Jun 2010 07:06:17 -0400
- From: Matthew Farrellee <matt@xxxxxxxxxx>
- Subject: Re: [Condor-users] Assertion ERROR on (result)" at line 655 in file pseudo_ops.cpp
On 06/03/2010 12:27 PM, Gabriele Foerstner wrote:
> We changed our condor version from 6.8.6 to 7.4.1 and now our condor
> jobs get very often the following error messages written in the log file:
>
> 007 (016.022.000) 06/03 16:05:17 Shadow exception!
> Error from slot3@xxxxxxxxxxxxxxx: Assertion ERROR on (result)
> 0 - Run Bytes Sent By Job
>
> The message may even show up several times for the same process.
>
> The ShadowLog shows the following Error:
> 06/03 16:26:17 (16.39) (18909): ERROR "Error from slot4@xxxxxxxxxxxxxxx:
> Assertion ERROR on (result)" at line 655 in file pseudo_ops.cpp
>
> The processes get rescheduled, as as they might get even rescheduled
> several times, the execution time of a job cluster is more than twice as
> long.
>
> Version 6.8.6 was running on RedhatE4 (64-bit AMD hosts), and 7.4.1 is
> running on RedhatE5 (64-bit AMD hosts).
> I tested as well version 7.4.2, but it didn't help.
>
> We run only vanilla jobs.
> As most of our jobs are very I/O intensive, so we use NFS to avoid
> network traffic (USE_NFS = True).
>
>
> Thanks for your help
> Gabriele
Not helpful wrt your error, just at note on USE_NFS because I see so many people referencing it -
According to the manual, it is primarily for the Standard Universe,
http://www.cs.wisc.edu/condor/manual/v7.5/3_3Configuration.html#15625
That said, a quick spin through the code suggests that USE_NFS is used by chirp in the Vanilla Universe and probably Parallel Universe. The manual could probably use with some updating, and the param could use a more specific name.
Best,
matt