Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] ERROR starting jobs: Jobs get evicted fror unknown reason (108)
- Date: Tue, 15 Aug 2006 17:29:34 +0200 (CEST)
- From: Thomas Bretz <tbretz@xxxxxxxxxxxxxxxxxxxxxx>
- Subject: [Condor-users] ERROR starting jobs: Jobs get evicted fror unknown reason (108)
Hi,
randomly many of our submitted jobs get immediatly evicted when
started. I have no idea what's going on, because one log-files says
"Unknown Reason" All other logfiles contain neither warnings nor errors.
The current behaviour of condor (6.8.0 on suse linux 9 and 10) makes it
completely unusable, because some jobs take 20-30 negotiation cycles until
they really start running. I also tried to switch on more log-output, but
also this output does not contain any information which gives a hint why
the jobs are evicted.
Any help is welcome,
Thomas
-------------------------------------
Part of the Setup:
WANT_SUSPEND = False
WANT_VACATE = False
START = True
SUSPEND = False
CONTINUE = True
PREEMPT= False
CLAIM_WORKLIFE = 0
MaxJobRetirementTime = 0
KILL = False
NEGOTIATOR_PRE_JOB_RANK = 0
NEGOTIATOR_POST_JOB_RANK = 0
PREEMPTION_REQUIREMENTS = False
PREEMPTION_RANK = 0
ShadowLog:
8/15 16:46:33 ******************************************************
8/15 16:46:33 ** condor_shadow (CONDOR_SHADOW) STARTING UP
8/15 16:46:33 ** /home/condor/condor-6.8.0/sbin/condor_shadow
8/15 16:46:33 ** $CondorVersion: 6.8.0 Jul 19 2006 $
8/15 16:46:33 ** $CondorPlatform: X86_64-LINUX_RHEL3 $
8/15 16:46:33 ** PID = 27459
8/15 16:46:33 ** Log last touched 8/15 16:46:31
8/15 16:46:33 ******************************************************
8/15 16:46:33 Using config source: /home/condor/condor_config
8/15 16:46:33 Using local config sources:
8/15 16:46:33 /home/condor/hosts/dc08/condor_config.local
8/15 16:46:33 DaemonCore: Command Socket at <132.187.*.*:56626>
8/15 16:46:33 Initializing a VANILLA shadow for job 3105.0
8/15 16:46:33 (3105.0) (27459): Request to run on <132.187.*.*:58903>
was REFUSED
8/15 16:46:33 (3105.0) (27459): Job 3105.0 is being evicted
8/15 16:46:33 (3105.0) (27459): logEvictEvent with unknown reason (108),
aborting
8/15 16:46:33 (3105.0) (27459): **** condor_shadow (condor_SHADOW) EXITING
WITH STATUS 108
NegotiatorLog:
8/15 16:43:28 Request 03105.00000:
8/15 16:43:28 Matched 3105.0 tbretz@xxxxxxxxxxxxxxxxxxxxxx
<132.187.47.28:52515> preempting none <132.187.47.22:58903>
vm2@xxxxxxxxxxxxxxxxxxxxxxxxxxx
8/15 16:43:28 Successfully matched with
vm2@xxxxxxxxxxxxxxxxxxxxxxxxxxx