Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Windows, Credd, and run_as_owner question
- Date: Wed, 5 Dec 2007 13:45:05 -0500
- From: "Valencia, Matthew C." <Matthew.Valencia@xxxxxxxxxx>
- Subject: Re: [Condor-users] Windows, Credd, and run_as_owner question
Hi,
I'm trying to set up
a simple Condor (6.9.5) pool where:
Machine A is the
Collector / Negotiator / Submit machine
Machine B is the
Execute machine
So far, I've been
able to successfully run jobs *except* for when I set 'run_as_owner
= true' in the submit file. When I do that, the jobs just sit in the
queue (the output of condor_queue -an follows):
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
005.004: Run
analysis summary. Of 2 machines,
2 are
rejected by your job's requirements
0 reject
your job because of their own requirements
0
match but are serving users with a better priority in the
pool
0 match but reject the job for unknown
reasons
0 match but will not currently preempt
their existing job
0 are available to run your
job
WARNING: Be
advised:
No resources matched request's
constraints
Check the Requirements _expression_
below:
Requirements = (Arch
== "INTEL") && (OpSys == "WINNT51") && (Disk >=
DiskUsage)
&& ((Memory * 1024) >= ImageSize) &&
(HasFileTransfer) && (HasWindowsRunAsOwner
&& (LocalCredd =?=
"A.dom1.jhuapl.edu"))
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
The only difference
between the cases in which I use run_as_owner are the last two requirements
(HasWindowsRunAsOwner and LocalCredd). I verified that the ClassAd for
Machine B has HasWindowsRunAsOwner = TRUE, but the LocalCredd doesn't appear to
be defined. I thought it likely that I messed something up in the
configuration of credd, so I looked at the log file (ASDSUser is logged into
machine A):
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
12/5 13:04:51
******************************************************
12/5 13:04:52 **
condor_credd.exe (CONDOR_CREDD) STARTING UP
12/5 13:04:52 **
C:\condor\bin\condor_credd.exe
12/5 13:04:53 ** $CondorVersion: 6.9.5 Nov 28
2007 $
12/5 13:04:53 ** $CondorPlatform: INTEL-WINNT50 $
12/5 13:04:53 **
PID = 1716
12/5 13:04:53 ** Log last touched time unavailable (No such file
or directory)
12/5 13:04:53
******************************************************
12/5 13:04:53 Using
config source: C:\condor\condor_config
12/5 13:04:53 Using local config
sources:
12/5 13:04:53
C:\condor/condor_config.local
12/5 13:04:53
C:\condor/condor_config.local.credd
12/5 13:04:53 DaemonCore: Command Socket
at <128.244.140.226:9620>
12/5 13:04:53 main_init() called
12/5
13:04:53 Calling Handler <<128.244.140.226:9618>>
12/5 13:04:53
ZKM: setting default map to (null)
12/5 13:04:53 Return from Handler
<<128.244.140.226:9618>>
12/5 13:04:54 ZKM: setting default map
to (null)
12/5 13:05:16 Calling Handler
<DaemonCore::HandleReqSocketHandler>
12/5 13:05:16 condor_read():
recv() returned -1, errno = 10054, assuming failure reading 5 bytes from
<128.244.140.110:4207>.
12/5 13:05:16 IO: Failed to read packet
header
12/5 13:05:16 condor_read(): recv() returned -1, errno = 10054,
assuming failure reading 5 bytes from <128.244.140.110:4207>.
12/5
13:05:16 IO: Failed to read packet header
12/5 13:05:16 AUTHENTICATE:
handshake failed!
12/5 13:05:16 DC_AUTHENTICATE: authenticate failed:
AUTHENTICATE:1002:Failure performing handshake|AUTHENTICATE:1004:Failed to
authenticate using PASSWORD
12/5 13:05:16 Return from Handler
<DaemonCore::HandleReqSocketHandler>
12/5 13:06:34 Calling Handler
<DaemonCore::HandleReqSocketHandler>
12/5 13:06:34 ZKM: setting default
map to ASDSUser@jhuapl
12/5 13:06:34
Calling HandleReq <store_cred_handler> (0)
12/5 13:06:34 Return from
HandleReq <store_cred_handler>
12/5 13:06:34 Return from Handler
<DaemonCore::HandleReqSocketHandler>
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
I googled a bit and
thought that I may have forgotten to set the condor_pool password. So, I
tried that (condor_store_cred -c -n A.dom1.jhuapl.edu add, condor_store_cred -c
-n B.dom1.jhuapl.edu add). I tried that and the same behavior occurred
(although the condor_store_cred command did return with 'Operation
succeeded.'). Here are the contents of my Machine A's
MasterLog:
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
12/5 13:04:47
SetEnvironmentVariable failed, errno=203
12/5 13:04:47
******************************************************
12/5 13:04:47 **
Condor (CONDOR_MASTER) STARTING UP
12/5 13:04:47 **
C:\condor\bin\condor_master.exe
12/5 13:04:47 ** $CondorVersion: 6.9.5 Nov 28
2007 $
12/5 13:04:47 ** $CondorPlatform: INTEL-WINNT50 $
12/5 13:04:47 **
PID = 772
12/5 13:04:47 ** Log last touched time unavailable (No such file or
directory)
12/5 13:04:47
******************************************************
12/5 13:04:47 Using
config source: C:\condor\condor_config
12/5 13:04:47 Using local config
sources:
12/5 13:04:47
C:\condor/condor_config.local
12/5 13:04:47
C:\condor/condor_config.local.credd
12/5 13:04:47 DaemonCore: Command Socket
at <128.244.140.226:3975>
12/5 13:04:48 Started DaemonCore process
"C:\condor/bin/condor_collector.exe", pid and pgroup = 3788
12/5 13:04:51
Started DaemonCore process "C:\condor/bin/condor_negotiator.exe", pid and pgroup
= 2536
12/5 13:04:51 Started DaemonCore process
"C:\condor/bin/condor_schedd.exe", pid and pgroup = 2620
12/5 13:04:51
Started DaemonCore process "C:\condor/bin/condor_credd.exe", pid and pgroup =
1716
12/5 13:04:51 ZKM: setting default map to SYSTEM@nt authority
12/5 13:04:53 ZKM: setting
default map to SYSTEM@nt authority
12/5
13:04:54 ZKM: setting default map to SYSTEM@nt
authority
12/5 13:04:54 ZKM: setting default map to SYSTEM@nt authority
12/5 13:04:56 ZKM: setting
default map to (null)
12/5 13:08:06 ZKM: setting default map to ASDSUser@jhuapl
12/5 13:08:06
store_pool_cred: failed to receive all parameters
and my Machine A's
CollectorLog (just in case that's important):
12/5 13:04:48
******************************************************
12/5 13:04:48 **
condor_collector.exe (CONDOR_COLLECTOR) STARTING UP
12/5 13:04:48 **
C:\condor\bin\condor_collector.exe
12/5 13:04:48 ** $CondorVersion: 6.9.5 Nov
28 2007 $
12/5 13:04:48 ** $CondorPlatform: INTEL-WINNT50 $
12/5 13:04:48
** PID = 3788
12/5 13:04:48 ** Log last touched time unavailable (No such
file or directory)
12/5 13:04:48
******************************************************
12/5 13:04:48 Using
config source: C:\condor\condor_config
12/5 13:04:48 Using local config
sources:
12/5 13:04:48
C:\condor/condor_config.local
12/5 13:04:48
C:\condor/condor_config.local.credd
12/5 13:04:48 DaemonCore: Command Socket
at <128.244.140.226:9618>
12/5 13:04:48 In ViewServer::Init()
12/5
13:04:48 In CollectorDaemon::Init()
12/5 13:04:48 In
ViewServer::Config()
12/5 13:04:48 In CollectorDaemon::Config()
12/5
13:04:48 enable: Creating stats hash table
12/5 13:04:49 ZKM: setting default
map to ANONYMOUS LOGON@
12/5 13:04:51 ZKM: setting default map to
(null)
12/5 13:04:52 MasterAd : Inserting ** "<
B.dom1.jhuapl.edu >"
12/5 13:04:52 stats: Inserting new hashent for
'Master':'B.dom1.jhuapl.edu':'128.244.140.110'
12/5 13:04:53 ZKM: setting
default map to SYSTEM@nt authority
12/5
13:04:53 creating new table for type CredD
12/5 13:04:53 CredD: Inserting **
"< A.dom1.jhuapl.edu >"
12/5 13:04:53 stats: Inserting new hashent for
'CredD':'A.dom1.jhuapl.edu':'128.244.140.226'
12/5 13:04:54 ZKM: setting
default map to SYSTEM@nt authority
12/5
13:04:54 (Sending 2 ads in response to query)
12/5 13:04:54 ZKM: setting
default map to SYSTEM@nt authority
12/5
13:04:54 Got QUERY_STARTD_PVT_ADS
12/5 13:04:54 (Sending 0 ads in response to
query)
12/5 13:04:54 NegotiatorAd : Inserting ** "<
A.dom1.jhuapl.edu >"
12/5 13:04:54 stats: Inserting new hashent for
'Negotiator':'A.dom1.jhuapl.edu':'128.244.140.226'
12/5 13:04:56 ZKM: setting
default map to SYSTEM@nt authority
12/5
13:04:57 MasterAd : Inserting ** "< A.dom1.jhuapl.edu
>"
12/5 13:04:57 stats: Inserting new hashent for
'Master':'A.dom1.jhuapl.edu':'128.244.140.226'
12/5 13:04:58 ZKM: setting
default map to SYSTEM@nt authority
12/5
13:04:58 ScheddAd : Inserting ** "< A.dom1.jhuapl.edu
, 128.244.140.226 >"
12/5 13:04:58 stats: Inserting new hashent for
'Schedd':'A.dom1.jhuapl.edu':'128.244.140.226'
12/5 13:05:03 DC_AUTHENTICATE:
attempt to open invalid session A:2336:1196877304:8, failing.
12/5 13:05:04
ZKM: setting default map to ANONYMOUS LOGON@
12/5 13:05:04 WARNING: No
master ad for < slot2@xxxxxxxxxxxxxxxxx
>
12/5 13:05:04 StartdAd : Inserting ** "< slot2@xxxxxxxxxxxxxxxxx ,
128.244.140.110 >"
12/5 13:05:04 stats: Inserting new hashent for
'Start':'slot2@xxxxxxxxxxxxxxxxx':'128.244.140.110'
12/5 13:05:04
StartdPvtAd : Inserting ** "< slot2@xxxxxxxxxxxxxxxxx ,
128.244.140.110 >"
12/5 13:05:04 stats: Inserting new hashent for
'StartdPvt':'slot2@xxxxxxxxxxxxxxxxx':'128.244.140.110'
12/5 13:05:08 Got
INVALIDATE_STARTD_ADS
12/5 13:05:08 **** Removing stale ad: "<
slot2@xxxxxxxxxxxxxxxxx ,
128.244.140.110 >"
12/5 13:05:08 (Invalidated 1 ads)
12/5 13:05:08
**** Removing stale ad: "< slot2@xxxxxxxxxxxxxxxxx ,
128.244.140.110 >"
12/5 13:05:08 (Invalidated 1 ads)
12/5 13:05:08 Got
INVALIDATE_STARTD_ADS
12/5 13:05:08 (Invalidated 0 ads)
12/5 13:05:08
(Invalidated 0 ads)
12/5 13:05:15 ZKM: setting default map to ANONYMOUS
LOGON@
12/5 13:05:15 (Sending 1 ads in response to query)
12/5 13:05:18
DaemonCore: Can't receive command request from 128.244.140.226 (perhaps a
timeout?)
12/5 13:05:19 ZKM: setting default map to ANONYMOUS LOGON@
12/5
13:05:35 ZKM: setting default map to ANONYMOUS LOGON@
12/5 13:05:35
WARNING: No master ad for < slot1@xxxxxxxxxxxxxxxxx
>
12/5 13:05:35 StartdAd : Inserting ** "< slot1@xxxxxxxxxxxxxxxxx ,
128.244.140.110 >"
12/5 13:05:35 stats: Inserting new hashent for
'Start':'slot1@xxxxxxxxxxxxxxxxx':'128.244.140.110'
12/5 13:05:35
StartdPvtAd : Inserting ** "< slot1@xxxxxxxxxxxxxxxxx ,
128.244.140.110 >"
12/5 13:05:35 stats: Inserting new hashent for
'StartdPvt':'slot1@xxxxxxxxxxxxxxxxx':'128.244.140.110'
12/5 13:05:36
StartdAd : Inserting ** "< slot2@xxxxxxxxxxxxxxxxx ,
128.244.140.110 >"
12/5 13:05:36 StartdPvtAd : Inserting ** "< slot2@xxxxxxxxxxxxxxxxx ,
128.244.140.110 >"
12/5 13:06:15 ZKM: setting default map to ASDSUser@jhuapl
12/5 13:06:15 Got
QUERY_STARTD_ADS
12/5 13:06:15 (Sending 2 ads in response to query)
12/5
13:06:35 SubmittorAd : Inserting ** "< ASDSUser@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
, 128.244.140.226 >"
12/5 13:06:35 stats: Inserting new hashent for
'Submittor':'ASDSUser@xxxxxxxxxxxxxxx':'128.244.140.226'
12/5 13:06:35 ZKM:
setting default map to SYSTEM@nt
authority
12/5 13:06:35 (Sending 1 ads in response to query)
12/5 13:06:36
(Sending 8 ads in response to query)
12/5 13:06:36 Got
QUERY_STARTD_PVT_ADS
12/5 13:06:36 (Sending 2 ads in response to
query)
12/5 13:06:40 ZKM: setting default map to ASDSUser@jhuapl
12/5 13:06:40 Got
QUERY_STARTD_ADS
12/5 13:06:40 (Sending 2 ads in response to query)
12/5
13:06:40 (Sending 1 ads in response to query)
12/5 13:08:06 ZKM: setting
default map to ASDSUser@jhuapl
12/5
13:08:06 Got QUERY_MASTER_ADS
12/5 13:08:06 (Sending 1 ads in response to
query)
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
I
would be very thankful if anyone could give me some
suggestions,
Thanks,
Matt