toby sebastian <toby@xxxxxxxxxxxx> wrote:
> I am facing a problem. I have configured one Red Hat Linux 9.0
> machine as Central manager, Submission Host, Execution Host. I
> have submitted jobs to the queue. But i am finding the status
> of every job as ' Idle ' only.
It's possible for a Condor job to take a few minutes to start up;
but assuming that it's still hanging around in idle, there are a
number of things to try.
The best summary of debugging I know of is here:
http://www.cs.wisc.edu/condor/CondorWeek2004/presentations/adesmet_admin_tutorial/#DebuggingJobs
Common cases:
- The job is actually starting, but is immediately failing (say,
because the executable isn't readable). The user log,
ShadowLog, and StarterLog will provide details.
- The machine isn't available to run the job. Does
"condor_status" report the machine as "Owner"? If so, the
machine is in use by the Owner. Quick fix: configure the
machine to always run jobs:
START=TRUE
SUSPEND=FALSE
CONTINUE=TRUE
PREEMPT=FALSE
KILL=FALSE
Then condor_restart
- The job isn't allowed to run (condor_q -analyze is the first
test).
- A weird, relatively rare bug: Condor will occasionally decide
that a pool with only one machine has no available machines.
Crude workaround: NUM_CPUS=2, condor_restart. If you're
actually hitting this one, please let us know.
--
Alan De Smet Condor Project Research
adesmet@xxxxxxxxxxx http://www.condorproject.org/
|