[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] 6.7.12 gridmanager crash



On Oct 6, 2005, at 12:55 PM, Rod Walker wrote:

Hi,
With 6.7.12 I`m seeing frequent crashes of the gridmanager.

10/6 09:52:48 [27842] ERROR "BaseJob::DoneWithJob called with unexpected
state IDLE (1)
" at line 269 in file basejob.C


This is for gt2 jobs to LCG CE's. Some jobs have finished succesfully and
I haven`t managed to tie the problem down yet. The symptom is so clear I
thought I`d ask first.

I haven't seen this before. Can you set GRIDMANAGER_DEBUG = D_FULLDEBUG in your config file, provoke the crash, and send me the resulting gridmanager log file (off the list)?


Quill works very nicely, which is the reason for upgrading.
Is there some way to rollback just the gridmanager for now? When I tried
replacing the gridmanager and gahpserver binaries for older ones I got
HoldReason = "GlobusResource is not set in the job ad"

The gridmanager and schedd have to be the same version (in particular, they must both be pre- or post-6.7.11).


+----------------------------------+---------------------------------+
|            Jaime Frey            |  Public Split on Whether        |
|        jfrey@xxxxxxxxxxx         |  Bush Is a Divider              |
|  http://www.cs.wisc.edu/~jfrey/  |         -- CNN Scrolling Banner |
+----------------------------------+---------------------------------+