[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Jobs do not execute, they sit idle in the queue indefinitely



Hi,

I'm attempting to configure a test condor cluster.  I have 10 machines
all running Centos 6.4
They are not configured with DNS records, they all have /etc/hosts files
that contain the relevant ip addresses for each node in the cluster.

I've configured the stable repo and used that to install the condor
software.
I then modified the /etc/condor/condor_config so that the subnet these
machines reside on was enabled for write access.

A quick test showed everything was working and jobs would execute as
expected.
However, this was with the following condor_config.local entry on each
of the 10 nodes

DAEMON_LIST = COLLECTOR, MASTER, NEGOTIATOR, SCHEDD, STARTD

I am now attempting to configured one node as a gatekeeper
DAEMON_LIST = COLLECTOR, MASTER, NEGOTIATOR, SCHEDD

And the other 9 nodes as execution only nodes
DAEMON_LIST = MASTER, STARTD

After restarting services I now no longer see jobs executing. They sit
idle in the queue indefinitely.

[root@node00 condor]# condor_q


-- Submitter: node00 : <10.11.114.220:44213> : node00
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE
CMD              
   2.0   mfs             5/17 13:41   0+00:00:00 I  0   0.0  myprog
Example.2.0

1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended

condor_q -analyze is not much help

-- Submitter: node00 : <10.11.114.220:44213> : node00
---
002.000:  Request has not yet been considered by the matchmaker.

I did notice the following warning in the SchedLog

SchedLog:05/17/13 13:41:21 (pid:9037) WARNING: forward resolution of
localhost.localdomain doesn't match 10.11.114.220!

I also found this entry which makes no sense to me since schedd is not
setup to run on node00 in the local config.

SchedLog:05/17/13 13:56:21 (pid:9037) Can't find address for startd node00

The test job itself is from the tutorial here:
http://research.cs.wisc.edu/htcondor/tutorials/scotland-admin-tutorial-2003-10-23/scotland-admin-tutorial-2003-10-23.DEMO.html

Any assistance pointing me in the right direction is greatly appreciated.

Regards,
Dan Shea

-- 
Dan Shea - daniel_shea2@xxxxxxxxxxxxxxx
Senior Systems Administrator, West Quad Computing Group
Harvard Medical School
"Charlie was a chemist, But Charlie is no more. For what he thought was H2O, Was H2SO4."