Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] jobs stuck in queue
- Date: Fri, 19 Aug 2011 17:36:15 -0300
- From: Fabricio Cannini <fcannini@xxxxxxxxx>
- Subject: [Condor-users] jobs stuck in queue
Hello
I have installed condor 7.6.0 in a master + 2 execute nodes scheme, with the
following configuration:
*master :*
UID_DOMAIN = internal.domain
FILESYSTEM_DOMAIN = internal.domain
SEC_DEFAULT_NEGOTIATION = OPTIONAL
ALLOW_READ = $(FULL_HOSTNAME),@172.17.8.*
ALLOW_WRITE = $(FULL_HOSTNAME),@172.17.8.*
ALLOW_NEGOTIATOR = $(CONDOR_HOST)
ALLOW_CONFIG = $(CONDOR_HOST),$(FULL_HOSTNAME)
ENABLE_RUNTIME_CONFIG = True
ENABLE_PERSISTENT_CONFIG = True
PERSISTENT_CONFIG_DIR = /etc/condor/config.d
SETTABLE_ATTRS_CONFIG = *
USE_NFS = True
DEFAULT_DOMAIN_NAME = internal.domain
TRUST_UID_DOMAIN = True
DAEMON_LIST = MASTER, STARTD, SCHEDD, COLLECTOR, NEGOTIATOR
SOFT_UID_DOMAIN = TRUE
START = TRUE
*nodes:*
CONDOR_HOST = master
UID_DOMAIN = internal.domain
FILESYSTEM_DOMAIN = internal.domain
SEC_DEFAULT_NEGOTIATION = OPTIONAL
ALLOW_READ = $(CONDOR_HOST),172.17.8.*
ALLOW_WRITE = $(CONDOR_HOST),172.17.8.*
ALLOW_NEGOTIATOR = $(CONDOR_HOST)
ALLOW_CONFIG = $(CONDOR_HOST),$(FULL_HOSTNAME)
ENABLE_RUNTIME_CONFIG = True
ENABLE_PERSISTENT_CONFIG = True
PERSISTENT_CONFIG_DIR = /etc/condor/config.d
SETTABLE_ATTRS_CONFIG = *
USE_NFS = True
DEFAULT_DOMAIN_NAME = internal.domain
ALLOW_DAEMON = *@$(CONDOR_HOST)
SOFT_UID_DOMAIN = TRUE
START = TRUE
TRUST_UID_DOMAIN = TRUE
STARTD_EXPRS=$(STARTD_EXPRS), DedicatedScheduler, ParallelSchedulingGroup
SCHEDD_NAME = $(CONDOR_HOST)
When i submit a simple job like this:
###############################
Error = err-$(cluster).log
Output = out-$(cluster).log
Log = log-$(cluster).log
cmd = /bin/cat
arguments = /proc/cpuinfo
Queue
###############################
It goes ok. But a little more complicated job like this:
===============================
universe = parallel
Error = err-$(cluster).log
Output = out-$(cluster).log
Log = log-$(cluster).log
executable = /usr/bin/mpirun
arguments = -np 8 -host node-01,node-02 /home/user/hw
machine_count = 2
Queue
===============================
The job goes to idle state:
-- Submitter: master.internal.domain : <172.17.8.121:58829> :
master.internal.domain
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
33.0 user 8/19 16:48 0+00:00:00 I 0 0.1 mpirun -np 8 -host
"/home/user/hw" is just a simple mpi hello world.
Any tips to what may (not) be going on are very, very, veeeeery welcome.
TIA