[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] logEvictEvent with unknown reason (108)



Dan,

Thanks for the help.  I'm getting messages like the one below, which
seem to indicate that the startd disagrees with the negotiator about the
capabilities of the machines. ???

- dave

_________________________________________________

2/16 03:42:51 Error: can't find resource with capability
(<128.83.120.216:32772>#1436503429)
2/16 03:42:59 DaemonCore: Command received via TCP from host
<128.83.144.225:39738>
2/16 03:42:59 DaemonCore: received command 444 (ACTIVATE_CLAIM), calling
handler (command_activate_claim)
2/16 03:42:59 vm2: Got activate_claim request from shadow
(<128.83.144.225:39738>)
2/16 03:42:59 vm2: Job Requirements check failed!
2/16 03:42:59 DaemonCore: Command received via TCP from host
<128.83.144.225:39739>
2/16 03:42:59 DaemonCore: received command 404
(DEACTIVATE_CLAIM_FORCIBLY), calling handler (command_handler)
2/16 03:42:59 vm2: Called deactivate_claim_forcibly()
2/16 03:42:59 DaemonCore: Command received via UDP from host
<128.83.144.225:59716>
2/16 03:42:59 DaemonCore: received command 443 (RELEASE_CLAIM), calling
handler (command_handler)
2/16 03:42:59 vm2: State change: received RELEASE_CLAIM command
2/16 03:42:59 vm2: Changing state and activity: Claimed/Idle ->
Preempting/Vacating
2/16 03:42:59 vm2: State change: No preempting claim, returning to owner
2/16 03:42:59 vm2: Changing state and activity: Preempting/Vacating ->
Owner/Idle
2/16 03:42:59 vm2: State change: IS_OWNER is false
2/16 03:42:59 vm2: Changing state: Owner -> Unclaimed

__________________________________


On Tue, 2005-02-15 at 07:54 -0600, Dan Bradley wrote:
> David,
> 
> The example below shows that the startd is refusing to "activate" a 
> resource claim.  Are most of the evictions that you see like that?  Look 
> in StartLog on the worker node to see the specific reason for the refusal.
> 
> --Dan
> 
> David A. Kotz wrote:
> 
> >I'm getting a *lot* of evictions like the one below, where the reason is
> >listed as unknown.  How might I diagnose the cause of the problem?
> >
> >
> >
> >2/14 16:47:49 ******************************************************
> >2/14 16:47:49 ** condor_shadow (CONDOR_SHADOW) STARTING UP
> >2/14 16:47:49 ** /lusr/opt/condor-6.6.6/sbin/condor_shadow
> >2/14 16:47:49 ** $CondorVersion: 6.6.6 Jul 26 2004 $
> >2/14 16:47:49 ** $CondorPlatform: I386-LINUX_RH72 $
> >2/14 16:47:49 ** PID = 27611
> >2/14 16:47:49 ******************************************************
> >2/14 16:47:49 Using config file: /lusr/condor/etc/condor_config
> >2/14 16:47:49 Using local config
> >files: /lusr/condor/etc/LINUX-INTEL/local/mast
> >2/14 16:47:49 DaemonCore: Command Socket at <128.x.x.x:58663>
> >2/14 16:47:50 Initializing a VANILLA shadow
> >2/14 16:47:50 (15.46) (27611): Request to run on <128.x.x.y:32772> was
> >REFUSED
> >2/14 16:47:50 (15.46) (27611): Job 15.46 is being evicted
> >2/14 16:47:50 (15.46) (27611): logEvictEvent with unknown reason (108),
> >aborting
> >2/14 16:47:50 (15.46) (27611): **** condor_shadow (condor_SHADOW)
> >EXITING WITH STATUS 108
> >
> >  
> >
> 
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
-- 
David A. Kotz <dkotz@xxxxxxxxxxxxx>