Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[HTCondor-users] stuck submit jobs
Dear Condor administrators
I submitted a job, but the job stuck at idle state.
The machine configuration is Scientific Linux CERN 6.5, condor 8.2.3,
and the host machine has both global IP address and local IP address.
I think configuration of IP address or DNS is something wrong.
I write the log below and attach the configuration file.
Could you tell me how to fix it?
Thank you in advance.
Best regards,
---
MasterLog
10/07/14 15:30:43
******************************************************
10/07/14 15:30:43 ** condor_master (CONDOR_MASTER) STARTING UP
10/07/14 15:30:43 ** /usr/sbin/condor_master
10/07/14 15:30:43 ** SubsystemInfo: name=MASTER type=MASTER(2)
class=DAEMON(1)
10/07/14 15:30:43 ** Configuration: subsystem:MASTER local:<NONE>
class:DAEMON
10/07/14 15:30:43 ** $CondorVersion: 8.2.3 Sep 30 2014 BuildID:
274619 $
10/07/14 15:30:43 ** $CondorPlatform: x86_64_RedHat6 $
10/07/14 15:30:43 ** PID = 2358
10/07/14 15:30:43 ** Log last touched 10/7 15:30:43
10/07/14 15:30:43
******************************************************
10/07/14 15:30:43 Using config source: /etc/condor/condor_config
10/07/14 15:30:43 Using local config sources:
10/07/14 15:30:43 /etc/condor/config.d/condor_config.local
10/07/14 15:30:43 /etc/condor/config.d/condor_config.local
10/07/14 15:30:43 config Macros = 65, Sorted = 65, StringBytes =
2025, TablesBytes = 2396
10/07/14 15:30:43 CLASSAD_CACHING is OFF
10/07/14 15:30:43 Daemon Log is logging: D_ALWAYS D_ERROR
10/07/14 15:30:43 DaemonCore: command socket at <192.168.12.1:41030>
10/07/14 15:30:43 DaemonCore: private command socket at
<192.168.12.1:41030>
10/07/14 15:30:43 Master restart (GRACEFUL) is watching
/usr/sbin/condor_master (mtime:1412124630)
10/07/14 15:30:43 Started DaemonCore process
"/usr/sbin/condor_collector", pid and pgroup = 8370
10/07/14 15:30:43 Waiting for /var/log/condor/.collector_address to
appear.
10/07/14 15:30:43 PERMISSION DENIED to unauthenticated@unmapped
from host 192.168.12.1 for command 60008 (DC_CHILDALIVE), access level
DAEMON: reason: DAEMON authorization policy contains no matching ALLOW
entry for this request; identifiers used for this host:
192.168.12.1,bepp01,bepp01.bepp.rcapp.kyushu-u.ac.jp, hostname size = 2,
original ip address = 192.168.12.1
10/07/14 15:30:44 Found /var/log/condor/.collector_address.
10/07/14 15:30:44 Started DaemonCore process
"/usr/sbin/condor_negotiator", pid and pgroup = 8372
10/07/14 15:30:44 Started DaemonCore process
"/usr/sbin/condor_schedd", pid and pgroup = 8373
10/07/14 15:30:44 PERMISSION DENIED to unauthenticated@unmapped
from host 192.168.12.1 for command 60008 (DC_CHILDALIVE), access level
DAEMON: reason: cached result for DAEMON; see first case for the full reason
10/07/14 15:30:44 PERMISSION DENIED to unauthenticated@unmapped
from host 192.168.12.1 for command 60008 (DC_CHILDALIVE), access level
DAEMON: reason: cached result for DAEMON; see first case for the full reason
CollectorLog
10/07/14 15:45:43 Housekeeper: Done cleaning
10/07/14 15:45:44 PERMISSION DENIED to unauthenticated user from
host 192.168.12.1 for command 49 (UPDATE_NEGOTIATOR_AD), access level
NEGOTIATOR: reason: cached result for NEGOTIATOR; see first case for the
full reason
10/07/14 15:45:45 PERMISSION DENIED to unauthenticated@unmapped
from host 192.168.12.1 for command 10 (QUERY_STARTD_PVT_ADS), access
level NEGOTIATOR: reason: cached result for NEGOTIATOR; see first case
for the full reason
10/07/14 15:45:48 PERMISSION DENIED to unauthenticated user from
host 192.168.12.1 for command 2 (UPDATE_MASTER_AD), access level
ADVERTISE_MASTER: reason: cached result for ADVERTISE_MASTER; see first
case for the full reason
10/07/14 15:45:51 PERMISSION DENIED to unauthenticated user from
host 192.168.12.1 for command 1 (UPDATE_SCHEDD_AD), access level
ADVERTISE_SCHEDD: reason: cached result for ADVERTISE_SCHEDD; see first
case for the full reason
10/07/14 15:45:51 PERMISSION DENIED to unauthenticated user from
host 192.168.12.1 for command 11 (UPDATE_SUBMITTOR_AD), access level
ADVERTISE_SCHEDD: reason: cached result for ADVERTISE_SCHEDD; see first
case for the full reason
10/07/14 15:46:06 DC_AUTHENTICATE: attempt to open invalid session
bepp01:2365:1412654380:10, failing; this session was requested by
<192.168.12.65:58496> with return address <192.168.12.1:37289>
10/07/14 15:46:06 attempt to connect to <192.168.12.1:37289>
failed: Connection refused (connect errno = 111).
10/07/14 15:46:06 Failed to send DC_INVALIDATE_KEY to daemon at
<192.168.12.1:37289>: SECMAN:2003:TCP connection to daemon at
<192.168.12.1:37289> failed.
10/07/14 15:46:07 DC_AUTHENTICATE: attempt to open invalid session
bepp01:2365:1412654379:5, failing; this session was requested by
<192.168.12.53:54534> with return address <192.168.12.1:47275>
10/07/14 15:46:07 attempt to connect to <192.168.12.1:47275>
failed: Connection refused (connect errno = 111).
10/07/14 15:46:07 Failed to send DC_INVALIDATE_KEY to daemon at
<192.168.12.1:47275>: SECMAN:2003:TCP connection to daemon at
<192.168.12.1:47275> failed.
10/07/14 15:46:07 DC_AUTHENTICATE: attempt to open invalid session
bepp01:2365:1412654380:12, failing; this session was requested by
<192.168.12.56:55342> with return address <192.168.12.1:46531>
10/07/14 15:46:07 attempt to connect to <192.168.12.1:46531>
failed: Connection refused (connect errno = 111).
10/07/14 15:46:07 Failed to send DC_INVALIDATE_KEY to daemon at
<192.168.12.1:46531>: SECMAN:2003:TCP connection to daemon at
<192.168.12.1:46531> failed.
10/07/14 15:46:08 DC_AUTHENTICATE: attempt to open invalid session
bepp01:2365:1412654380:7, failing; this session was requested by
<192.168.12.55:34093> with return address <192.168.12.1:48268>
10/07/14 15:46:08 attempt to connect to <192.168.12.1:48268>
failed: Connection refused (connect errno = 111).
10/07/14 15:46:08 Failed to send DC_INVALIDATE_KEY to daemon at
<192.168.12.1:48268>: SECMAN:2003:TCP connection to daemon at
<192.168.12.1:48268> failed.
10/07/14 15:46:09 DC_AUTHENTICATE: attempt to open invalid session
bepp01:2365:1412654379:3, failing; this session was requested by
<192.168.12.52:51590> with return address <192.168.12.1:44349>
10/07/14 15:46:09 attempt to connect to <192.168.12.1:44349>
failed: Connection refused (connect errno = 111).
10/07/14 15:46:09 Failed to send DC_INVALIDATE_KEY to daemon at
<192.168.12.1:44349>: SECMAN:2003:TCP connection to daemon at
<192.168.12.1:44349> failed.
NegotiatorLog
10/07/14 15:40:45 ---------- Started Negotiation Cycle ----------
10/07/14 15:40:45 Phase 1: Obtaining ads from collector ...
10/07/14 15:40:45 Getting startd private ads ...
10/07/14 15:40:45 Couldn't fetch ads: communication error
10/07/14 15:40:45 Aborting negotiation cycle
SchedLog
10/07/14 15:30:43 (pid:8272) **** condor_schedd (condor_SCHEDD) pid
8272 EXITING WITH STATUS 0
10/07/14 15:30:44 (pid:8373) Setting maximum file descriptors to 4096.
10/07/14 15:30:44 (pid:8373)
******************************************************
10/07/14 15:30:44 (pid:8373) ** condor_schedd (CONDOR_SCHEDD)
STARTING UP
10/07/14 15:30:44 (pid:8373) ** /usr/sbin/condor_schedd
10/07/14 15:30:44 (pid:8373) ** SubsystemInfo: name=SCHEDD
type=SCHEDD(5) class=DAEMON(1)
10/07/14 15:30:44 (pid:8373) ** Configuration: subsystem:SCHEDD
local:<NONE> class:DAEMON
10/07/14 15:30:44 (pid:8373) ** $CondorVersion: 8.2.3 Sep 30 2014
BuildID: 274619 $
10/07/14 15:30:44 (pid:8373) ** $CondorPlatform: x86_64_RedHat6 $
10/07/14 15:30:44 (pid:8373) ** PID = 8373
10/07/14 15:30:44 (pid:8373) ** Log last touched 10/7 15:30:43
10/07/14 15:30:44 (pid:8373)
******************************************************
10/07/14 15:30:44 (pid:8373) Using config source:
/etc/condor/condor_config
10/07/14 15:30:44 (pid:8373) Using local config sources:
10/07/14 15:30:44 (pid:8373) /etc/condor/config.d/condor_config.local
10/07/14 15:30:44 (pid:8373) /etc/condor/config.d/condor_config.local
10/07/14 15:30:44 (pid:8373) config Macros = 66, Sorted = 66,
StringBytes = 2068, TablesBytes = 2432
10/07/14 15:30:44 (pid:8373) CLASSAD_CACHING is ENABLED
10/07/14 15:30:44 (pid:8373) Daemon Log is logging: D_ALWAYS D_ERROR
10/07/14 15:30:44 (pid:8373) DaemonCore: command socket at
<192.168.12.1:54168>
10/07/14 15:30:44 (pid:8373) DaemonCore: private command socket at
<192.168.12.1:54168>
10/07/14 15:30:44 (pid:8373) History file rotation is enabled.
10/07/14 15:30:44 (pid:8373) Maximum history file size is:
20971520 bytes
10/07/14 15:30:44 (pid:8373) Number of rotated history files is: 2
10/07/14 15:30:49 (pid:8373) TransferQueueManager stats: active
up=0/10 down=0/10; waiting up=0 down=0; wait time up=0s down=0s
10/07/14 15:30:49 (pid:8373) TransferQueueManager upload 1m I/O
load: 0 bytes/s 0.000 disk load 0.000 net load
10/07/14 15:30:49 (pid:8373) TransferQueueManager download 1m I/O
load: 0 bytes/s 0.000 disk load 0.000 net load
10/07/14 15:30:49 (pid:8373) Sent ad to central manager for
hyamaguc@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
10/07/14 15:30:49 (pid:8373) Sent ad to 1 collectors for
hyamaguc@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
10/07/14 15:35:50 (pid:8373) TransferQueueManager stats: active
up=0/10 down=0/10; waiting up=0 down=0; wait time up=0s down=0s
10/07/14 15:35:50 (pid:8373) TransferQueueManager upload 1m I/O
load: 0 bytes/s 0.000 disk load 0.000 net load
10/07/14 15:35:50 (pid:8373) TransferQueueManager download 1m I/O
load: 0 bytes/s 0.000 disk load 0.000 net load
10/07/14 15:35:50 (pid:8373) Sent ad to central manager for
hyamaguc@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
10/07/14 15:35:50 (pid:8373) Sent ad to 1 collectors for
hyamaguc@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
10/07/14 15:40:51 (pid:8373) TransferQueueManager stats: active
up=0/10 down=0/10; waiting up=0 down=0; wait time up=0s down=0s
10/07/14 15:40:51 (pid:8373) TransferQueueManager upload 1m I/O
load: 0 bytes/s 0.000 disk load 0.000 net load
10/07/14 15:40:51 (pid:8373) TransferQueueManager download 1m I/O
load: 0 bytes/s 0.000 disk load 0.000 net load
10/07/14 15:40:51 (pid:8373) Sent ad to central manager for
hyamaguc@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
10/07/14 15:40:51 (pid:8373) Sent ad to 1 collectors for
hyamaguc@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Hiroshi Yamaguchi
######################################################################
##
## condor_config
##
## This is the global configuration file for condor. This is where
## you define where the local config file is. Any settings
## made here may potentially be overridden in the local configuration
## file. KEEP THAT IN MIND! To double-check that a variable is
## getting set from the configuration file that you expect, use
## condor_config_val -v <variable name>
##
## condor_config.annotated is a more detailed sample config file
##
## Unless otherwise specified, settings that are commented out show
## the defaults that are used if you don't define a value. Settings
## that are defined here MUST BE DEFINED since they have no default
## value.
##
######################################################################
CONDOR_ADMIN = root@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
#NETWORK_INTERFACE = 192.168.12.1
SEC_DEFAULT_AUTHENTICATION = NEVER
SEC_DEFAULT_NEGOTIATION = NEVER
CONDOR_HOST = bepp01.bepp.rcapp.kyushu-u.ac.jp
FULL_HOSTNAME = bepp01.bepp.rcapp.kyushu-u.ac.jp
RELEASE_DIR = /usr
LOCAL_DIR = /var
LOCAL_CONFIG_FILE = /etc/condor/config.d/condor_config.local
#REQUIRE_LOCAL_CONFIG_FILE = true
LOCAL_CONFIG_DIR = /etc/condor/config.d
#LOCAL_CONFIG_DIR_EXCLUDE_REGEXP = ^((\..*)|(.*~)|(#.*)|(.*\.rpmsave)|(.*\.rpmnew))$
use SECURITY : HOST_BASED
ALLOW_READ = bepp01.bepp.rcapp.kyushu-u.ac.jp
ALLOW_WRITE = hkt*.bepp.rcapp.kyushu-u.ac.jp
FLOCK_FROM =
FLOCK_TO =
#ALLOW_ADMINISTRATOR = $(CONDOR_HOST)
#ALLOW_NEGOTIATOR = $(CONDOR_HOST), $(IP_ADDRESS)
#ALLOW_NEGOTIATOR_SCHEDD = $(CONDOR_HOST), $(IP_ADDRESS)
#ALLOW_WRITE_COLLECTOR = $(ALLOW_WRITE)
#ALLOW_WRITE_STARTD = $(ALLOW_WRITE)
#ALLOW_READ_COLLECTOR = $(ALLOW_READ)
#ALLOW_READ_STARTD = $(ALLOW_READ)
#HOSTALLOW_READ = $(ALLOW_READ)
#HOSTALLOW_WRITE = $(ALLOW_WRITE)
ALLOW_DAEMON = condor_pool@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/*, condor@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/$(IP_ADDRESS)
ALLOW_NEGOTIATOR = condor_pool@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/$(CONDOR_HOST)
##--------------------------------------------------------------------
## Values set by the rpm patch script:
##--------------------------------------------------------------------
## For Unix machines, the path and file name of the file containing
## the pool password for password authentication.
#SEC_PASSWORD_FILE = $(LOCAL_DIR)/lib/condor/pool_password
## Pathnames
RUN = $(LOCAL_DIR)/run/condor
LOG = $(LOCAL_DIR)/log/condor
LOCK = $(LOCAL_DIR)/lock/condor
SPOOL = $(LOCAL_DIR)/lib/condor/spool
EXECUTE = $(LOCAL_DIR)/lib/condor/execute
BIN = $(RELEASE_DIR)/bin
LIB = $(RELEASE_DIR)/lib64/condor
INCLUDE = $(RELEASE_DIR)/include/condor
SBIN = $(RELEASE_DIR)/sbin
LIBEXEC = $(RELEASE_DIR)/libexec/condor
SHARE = $(RELEASE_DIR)/share/condor
PROCD_ADDRESS = $(RUN)/procd_pipe
NETWORK_INTERFACE = 192.168.12.1
CONDOR_HOST = $(FULL_HOSTNAME)
COLLECTOR_NAME = Personal Condor at $(FULL_HOSTNAME)
DAEMON_LIST = COLLECTOR, MASTER, NEGOTIATOR, SCHEDD