Re: [Condor-users] newbie (Job always in Idle)


Date: Mon, 14 Feb 2005 18:03:43 -0600
From: Alan De Smet <adesmet@xxxxxxxxxxx>
Subject: Re: [Condor-users] newbie (Job always in Idle)
toby sebastian <toby@xxxxxxxxxxxx> wrote:
> I am facing a problem. I have configured one Red Hat Linux 9.0
> machine as Central manager, Submission Host, Execution Host. I
> have submitted jobs to the queue. But i am finding the status
> of every job as ' Idle ' only.

It's possible for a Condor job to take a few minutes to start up;
but assuming that it's still hanging around in idle, there are a
number of things to try.

The best summary of debugging I know of is here:
http://www.cs.wisc.edu/condor/CondorWeek2004/presentations/adesmet_admin_tutorial/#DebuggingJobs

Common cases:

- The job is actually starting, but is immediately failing (say,
  because the executable isn't readable).  The user log,
  ShadowLog, and StarterLog will provide details.

- The machine isn't available to run the job.  Does
  "condor_status" report the machine as "Owner"?  If so, the
  machine is in use by the Owner.  Quick fix: configure the
  machine to always run jobs:
 
  	START=TRUE
	SUSPEND=FALSE
	CONTINUE=TRUE
	PREEMPT=FALSE
	KILL=FALSE

	Then condor_restart

- The job isn't allowed to run (condor_q -analyze is the first
  test).

- A weird, relatively rare bug: Condor will occasionally decide
  that a pool with only one machine has no available machines.
  Crude workaround: NUM_CPUS=2, condor_restart.  If you're
  actually hitting this one, please let us know.

-- 
Alan De Smet                              Condor Project Research
adesmet@xxxxxxxxxxx                 http://www.condorproject.org/

[← Prev in Thread] Current Thread [Next in Thread→]