Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Can't submit jobs using 7.4.1 (windows) - more info
- Date: Tue, 19 Jan 2010 16:44:49 +0800
- From: <Greg.Hitchen@xxxxxxxx>
- Subject: Re: [Condor-users] Can't submit jobs using 7.4.1 (windows) - more info
We have done some
more testing and remote submission, i.e.
condor_submit
-remote remote_schedd_machine
works fine from
windows 7.2.4 and 7.4.1 to a remote linux schedd running 7.2.3 or
7.4.1
We also setup a temp
windows central manager running 7.4.1 and trying to submit
jobs from a 7.4.1
windows submit machine in this pool also gives the exact same
errors
as originally listed
below.
Cheers
Greg
We have been doing
some testing re submitting jobs with windows version 7.4.1
We have previously
had no problems with linux central managers v7.2.3 and windows
clients
v7.2.4
Testing with PCs
shows
submit from 7.2.4 to
7.2.4 OK
submit from 7.2.4 to
7.4.1 OK
submit from 7.4.1 to
7.2.4 NOT OK
submit from 7.4.1 to
7.4.1 NOT OK
There appears to be
a DNS hostname lookup failure with the 7.4.1 schedd (see log
below).
We tried updating
the linux CM to 7.4.1 but it makes no difference.
See the 3 log
extracts below. The config files on the windows machines are
identical.
Thanks for any
insights/help.
Cheers
Greg
Excerpt from 7.4.1
schedd log that does NOT submit OK.
01/18 14:36:09
Locale: English_United States.1252
01/18 14:36:09
******************************************************
01/18 14:36:09 **
condor_schedd.exe (CONDOR_SCHEDD) STARTING UP
01/18 14:36:09 **
C:\PROGRA~1\condor\bin\condor_schedd.exe
01/18 14:36:09 ** SubsystemInfo:
name=SCHEDD type=SCHEDD(5) class=DAEMON(1)
01/18 14:36:09 ** Configuration:
subsystem:SCHEDD local:<NONE> class:DAEMON
01/18 14:36:09 **
$CondorVersion: 7.4.1 Dec 17 2009 BuildID: 204351 $
01/18 14:36:09 **
$CondorPlatform: INTEL-WINNT50 $
01/18 14:36:09 ** PID = 8660
01/18
14:36:09 ** Log last touched 1/18 11:37:29
01/18 14:36:09
******************************************************
01/18 14:36:09 Using
config source: c:\PROGRA~1\condor\condor_config
01/18 14:36:09 Using local
config sources:
01/18 14:36:09
C:\PROGRA~1\condor/condor_config.local
01/18 14:36:09 DaemonCore: Command
Socket at <130.116.146.156:9175>
01/18 14:36:10 History file rotation
is enabled.
01/18 14:36:10 Maximum history file size is:
100000000 bytes
01/18 14:36:10 Number of rotated history files
is: 5
01/18 14:36:10 my_popen: CreateProcess failed
01/18 14:36:10 Failed
to execute C:\PROGRA~1\condor/bin/condor_shadow.pvm, ignoring
01/18 14:36:10
my_popen: CreateProcess failed
01/18 14:36:10 Failed to execute
C:\PROGRA~1\condor/bin/condor_shadow.std, ignoring
01/18 14:36:12 Calling
Handler <DaemonCore::HandleReqSocketHandler> (4)
01/18 14:36:12
Received TCP command 479 (STORE_CRED) from <130.116.146.156:9137>,
access level WRITE
01/18 14:36:12 Calling HandleReq
<cred_access_handler> (0)
01/18 14:36:12 Return from HandleReq
<cred_access_handler> (handler: 0.016s, sec: 0.000s)
01/18 14:36:12
Return from Handler <DaemonCore::HandleReqSocketHandler>
01/18 14:36:12
Calling Handler <DaemonCore::HandleReqSocketHandler> (4)
01/18 14:36:12
Received TCP command 1111 (QMGMT_CMD) from <130.116.146.156:9304>,
access level READ
01/18 14:36:12 Calling HandleReq <handle_q>
(0)
01/18 14:36:12 Return from HandleReq <handle_q> (handler: 0.094s,
sec: 0.000s)
01/18 14:36:12 Return from Handler
<DaemonCore::HandleReqSocketHandler>
01/18 14:36:12 Received UDP
command 421 (RESCHEDULE) from <130.116.146.156:9630>, access level
WRITE
01/18 14:36:12 Calling HandleReq <reschedule_negotiator>
(0)
01/18 14:36:12 Return from HandleReq <reschedule_negotiator>
(handler: 0.000s, sec: 0.000s)
01/18 14:36:15 Sent ad to central manager for
hit023@xxxxxxxx
01/18 14:36:15 Sent ad
to 1 collectors for hit023@xxxxxxxx
01/18 14:36:15 Failed to
send RESCHEDULE to local negotiator:
01/18 14:36:46 Sent ad to central
manager for hit023@xxxxxxxx
01/18
14:36:46 Sent ad to 1 collectors for hit023@xxxxxxxx
01/18 14:36:46 Failed to
send RESCHEDULE to local negotiator:
01/18 14:37:10 Calling Handler
<DaemonCore::HandleReqSocketHandler> (4)
01/18 14:37:10 Received TCP
command 493 (NEGOTIATE_WITH_SIGATTRS) from <130.116.24.145:9926>,
access level NEGOTIATOR
01/18 14:37:10 Calling HandleReq <doNegotiate>
(0)
01/18 14:37:10 Negotiator hostname lookup failed!
01/18 14:37:10
Return from HandleReq <doNegotiate> (handler: 0.000s, sec:
0.000s)
01/18 14:37:10 Return from Handler
<DaemonCore::HandleReqSocketHandler>
01/18 14:37:17 Increasing flock
level for hit023 to 1.
01/18 14:37:17 Sent ad to central manager for hit023@xxxxxxxx
01/18 14:37:17 Sent ad to 1
collectors for hit023@xxxxxxxx
Excerpt from 7.2.4
schedd log that does submit OK.
1/18 11:39:04
******************************************************
1/18 11:39:04 **
condor_schedd.exe (CONDOR_SCHEDD) STARTING UP
1/18 11:39:04 **
C:\PROGRA~1\condor\bin\condor_schedd.exe
1/18 11:39:04 ** SubsystemInfo:
name=SCHEDD type=SCHEDD(5) class=DAEMON(1)
1/18 11:39:04 ** Configuration:
subsystem:SCHEDD local:<NONE> class:DAEMON
1/18 11:39:04 **
$CondorVersion: 7.2.4 Jun 15 2009 BuildID: 159529 $
1/18 11:39:04 **
$CondorPlatform: INTEL-WINNT50 $
1/18 11:39:04 ** PID = 9256
1/18 11:39:04
** Log last touched 1/15 15:20:45
1/18 11:39:04
******************************************************
1/18 11:39:04 Using
config source: c:\PROGRA~1\condor\condor_config
1/18 11:39:04 Using local
config sources:
1/18 11:39:04
C:\PROGRA~1\condor/condor_config.local
1/18 11:39:04 DaemonCore: Command
Socket at <130.116.146.156:9675>
1/18 11:39:04 History file rotation is
enabled.
1/18 11:39:04 Maximum history file size is: 100000000
bytes
1/18 11:39:04 Number of rotated history files is: 5
1/18
11:39:05 my_popen: CreateProcess failed
1/18 11:39:05 Failed to execute
C:\PROGRA~1\condor/bin/condor_shadow.pvm, ignoring
1/18 11:39:05 my_popen:
CreateProcess failed
1/18 11:39:05 Failed to execute
C:\PROGRA~1\condor/bin/condor_shadow.std, ignoring
1/18 11:39:29 Calling
Handler <DaemonCore::HandleReqSocketHandler>
1/18 11:39:29 Received TCP
command 479 (STORE_CRED) from <130.116.146.156:9315>, access level
WRITE
1/18 11:39:29 Calling HandleReq <cred_access_handler> (0)
1/18
11:39:29 Return from HandleReq <cred_access_handler> (handler: 0.109s,
sec: 0.000s)
1/18 11:39:29 Return from Handler
<DaemonCore::HandleReqSocketHandler>
1/18 11:39:29 Calling Handler
<DaemonCore::HandleReqSocketHandler>
1/18 11:39:29 Received TCP command
1111 (QMGMT_CMD) from <130.116.146.156:9264>, access level
READ
1/18 11:39:29 Calling HandleReq <handle_q> (0)
1/18 11:39:29
Return from HandleReq <handle_q> (handler: 0.078s, sec: 0.016s)
1/18
11:39:29 Return from Handler <DaemonCore::HandleReqSocketHandler>
1/18
11:39:30 Received UDP command 421 (RESCHEDULE) from
<130.116.146.156:9090>, access level WRITE
1/18 11:39:30 Calling
HandleReq <reschedule_negotiator> (0)
1/18 11:39:30 Sent ad to central
manager for hit023@xxxxxxxx
1/18
11:39:30 Sent ad to 1 collectors for hit023@xxxxxxxx
1/18 11:39:30 Called
reschedule_negotiator()
1/18 11:39:30 Return from HandleReq
<reschedule_negotiator> (handler: 0.016s, sec: 0.000s)
1/18 11:40:00
Sent ad to central manager for hit023@xxxxxxxx
1/18 11:40:00 Sent ad to 1
collectors for hit023@xxxxxxxx
1/18
11:40:00 Calling Handler <DaemonCore::HandleReqSocketHandler>
1/18
11:40:00 Received TCP command 493 (NEGOTIATE_WITH_SIGATTRS) from
<130.116.24.145:9581>, access level NEGOTIATOR
1/18 11:40:00 Calling
HandleReq <doNegotiate> (0)
1/18 11:40:00 Negotiating for owner: hit023@xxxxxxxx
Excerpt from
NegotiatorLog on central manager (linux 7.2.4 and 7.4.1 same
errors)
1/18 11:36:42
---------- Started Negotiation Cycle ----------
1/18 11:36:42 Phase 1:
Obtaining ads from collector ...
1/18 11:36:42 Getting all public
ads ...
1/18 11:36:43 Sorting 951 ads ...
1/18
11:36:43 Getting startd private ads ...
1/18 11:36:43 Got ads:
951 public and 466 private
1/18 11:36:43 Public ads include 2 submitter, 466
startd
1/18 11:36:43 Phase 2: Performing accounting ...
1/18
11:36:43 Phase 3: Sorting submitter ads by priority ...
1/18 11:36:43
Phase 4.1: Negotiating with schedds ...
1/18 11:36:43
Negotiating with hit023@xxxxxxxx at
<130.116.146.156:9007>
1/18 11:36:43 0 seconds so far
1/18 11:36:43
attempt to connect to <130.116.146.156:9007> failed: Connection
re
fused (connect errno = 111).
1/18 11:36:43
Failed to connect to hit023@xxxxxxxx
(<130.116.146.156:9007>)
1/18 11:36:43 Error: Ignoring
schedd for this cycle