Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] IDLE then RUN then IDLE for nothing
- Date: Fri, 25 Jun 2004 10:46:10 -0500
- From: Nick LeRoy <nleroy@xxxxxxxxxxx>
- Subject: Re: [Condor-users] IDLE then RUN then IDLE for nothing
On Fri June 25 2004 7:11 am, Jérôme Jaglale wrote:
> Hello,
>
> When I submit some condor jobs, they begin to run (ST=R) while a few
> seconds. Then they return to idle (ST=I) whithout any results. I
> examined the logs :
>
>
> The job log on the submit machine :
> 000 (001.000.000) 06/25 11:59:41 Job submitted from host:
> <192.168.1.1:54151>
> ...
> 007 (001.000.000) 06/25 11:59:58 Shadow exception!
> Can no longer talk to condor_starter on execute machine (192.168.1.23)
> 0 - Run Bytes Sent By Job
> 0 - Run Bytes Received By Job
>
>
> The StartLog on the execute machine :
> 6/25 11:59:19 Starter pid 25840 exited with status 4
>
>
> The StarterLog.vm2 on the execute machine :
> 6/25 11:59:13 ******************************************************
> 6/25 11:59:13 ** condor_starter (CONDOR_STARTER) STARTING UP
> 6/25 11:59:13 ** $CondorVersion: 6.6.5 May 3 2004 $
> 6/25 11:59:13 ** $CondorPlatform: PPC-DARWIN-6_8 $
> 6/25 11:59:13 ** PID = 25840
> 6/25 11:59:13 ******************************************************
> 6/25 11:59:13 Using config file:
> /Users/condor/Programmes/condor-6.6.5/etc/condor_config
> 6/25 11:59:13 Using local config files:
> /Users/condor/Programmes/condor-6.6.5/local.cluster13/
> condor_config.local
> 6/25 11:59:13 DaemonCore: Command Socket at <192.168.1.23:55008>
> 6/25 11:59:13 Setting resource limits not implemented!
> 6/25 11:59:13 Starter communicating with condor_shadow
> <192.168.1.1:54937>
> 6/25 11:59:13 Submitting machine is "(null)"
> 6/25 11:59:13 ERROR "Assertion ERROR on (shadow->name())" at line 984
> in file jic_shadow.C
> 6/25 11:59:13 ShutdownFast all jobs.
Notice the 'Submitting machine is "(null)"'? That, to me, is the smoking gun.
I can't offer an explanation as to what's causing it, but it's certainly
where I'd start digging. Almost certainly something is messed up in your
schedd or it's configuration.
> It's strange, I don't understand what really happens. Could you help ?
> The computers are running with Mac OS X.
Just a thought; is the host name and IP properly setup? It seems that we've
had a number of issues with OS/X /etc/hosts not being properly setup or
something like that.
> Thanks,
> Jérôme
Hope this helps
-Nick
--
<<< The answer is out there, Neo. >>>
/`-_ Nicholas R. LeRoy The Condor Project
{ }/ http://www.cs.wisc.edu/~nleroy http://www.cs.wisc.edu/condor
\ / nleroy@xxxxxxxxxxx The University of Wisconsin
|_*_| 608-265-5761 Department of Computer Sciences