Hi,
I am trying to set up
condor_credd on Windows XP. I have a central manager machine (nes30700) and one
submit/execute (ie. slave) machine (nes15300). The slave machine is
configured to always run jobs:
=================================================================
>
condor_status
Name
OpSys Arch
State Activity LoadAv Mem
ActvtyTime
vm1@NES30700. WINNT51 INTEL
Owner Idle
0.040 1023 0+00:05:15
vm2@NES30700. WINNT51 INTEL
Owner Idle
0.000 1023
0+00:05:16
nes15300.land WINNT51
INTEL Unclaimed Idle
-0.010 1022 0+00:09:55
=================================================================
To run jobs I had to use
"condor_store_cred" to set my password. I did this on both the central manager
and slave manager. (Is that correct?)
Once that was done, I could
successfully run a test program using condor_submit.
I want to use a shared
filesystem, so I tried to set up condor_credd. I did the
following:
1. copied the example file
(etc/condor_config.local.credd) into
condor_config.local in the condor main directory on both the central manager and
the slave machines;
2. added the following
lines to the condor_config file (on both the
central manager and the slave machines):
STARTER_ALLOW_RUNAS_OWNER = True
CREDD_HOST =
nes30700.lands.resnet.qg
CREDD_CACHE_LOCALLY =
True
SEC_CLIENT_AUTHENTICATION_METHODS = NTSSPI,
PASSWORD
3. Modified condor_config file (on both the central manager and the slave
machines):
COLLECTOR_NAME = QCCCE_condor
where
"QCCCE_condor" is the name of my condor pool
4. started condor on both the central manager
and the slave machines (using net start condor)
The condor_master,
condor_collector, condor_credd, condor_negotiator, condor_schedd and
condor_startd) daemons started on both machines. I thought condor_negotiator and
condor_collector were only supposed to run on the central manager machine, but
they were running on the both the central manager and the slave
machine.
5. added "run_as_owner = true" to the job config
file
When I submit a job it
appears in the queue but is "idle" and it doesn't get run:
=================================================================
>
condor_q
-- Submitter: NES30700.lands.resnet.qg :
<131.242.63.124:1144> :
NES30700.lands.resnet.qg
ID
OWNER
SUBMITTED RUN_TIME ST PRI SIZE
CMD
6.0 jeffreysj
3/7 14:07 0+00:00:00 I 0 9.8
output_name.exe
1 jobs; 1 idle, 0 running, 0 held
=================================================================
This same job executed
immediately before I installed the condor_credd.
The credd log file contains an authentication
error:
=================================================================
3/8 11:53:30
******************************************************
3/8 11:53:30 **
condor_credd.exe (CONDOR_CREDD) STARTING UP
3/8 11:53:30 **
D:\condor\bin\condor_credd.exe
3/8 11:53:30 ** $CondorVersion: 6.9.1
Jan 8 2007 $
3/8 11:53:30 ** $CondorPlatform: INTEL-WINNT50 $
3/8
11:53:30 ** PID = 2180
3/8 11:53:30 ** Log last touched 3/8 11:34:43
3/8
11:53:30 ******************************************************
3/8 11:53:30
Using config source: D:\condor\condor_config
3/8 11:53:30 Using local config
sources:
3/8 11:53:30 D:\condor/condor_config.local
3/8
11:53:30 DaemonCore: Command Socket at <131.242.63.124:9620>
3/8
11:53:30 main_init() called
3/8 11:53:30 Calling Timer handler 0
(dc_touch_log_file)
3/8 11:53:31 Return from Timer handler 0
(dc_touch_log_file)
3/8 11:53:31 Calling Timer handler 1
(check_session_cache)
3/8 11:53:31 Return from Timer handler 1
(check_session_cache)
3/8 11:53:31 Calling Timer handler 2
(handle_cookie_refresh)
3/8 11:53:31 Return from Timer handler 2
(handle_cookie_refresh)
3/8 11:53:31 Calling Timer handler 3
(self_monitor)
3/8 11:53:31 Return from Timer handler 3 (self_monitor)
3/8
11:53:31 Calling Timer handler 6 (update_collector)
3/8 11:53:31 Return from
Timer handler 6 (update_collector)
3/8 11:53:31 Calling Timer handler 5
(DaemonCore::SendAliveToParent)
3/8 11:53:31 Return from Timer handler 5
(DaemonCore::SendAliveToParent)
3/8 11:53:31 Calling Handler
<<131.242.63.124:9618>>
3/8 11:53:31 Return from Handler
<<131.242.63.124:9618>>
3/8 11:54:31 Calling Timer handler 7
(dc_touch_log_file)
3/8 11:54:31 Return from Timer handler 7
(dc_touch_log_file)
3/8 11:55:31 Calling Timer handler 8
(dc_touch_log_file)
3/8 11:55:31 Return from Timer handler 8
(dc_touch_log_file)
3/8 11:56:12 Calling Handler
<DaemonCore::HandleReqSocketHandler>
3/8 11:56:12
getStoredCredential(): Could not locate credential for user 'condor_pool@xxxxxxxxxxxxxxx'
3/8
11:56:12 getStoredCredential(): Could not locate credential for user 'condor_pool@xxxxxxxxxxxxxxx'
3/8
11:56:32 AUTHENTICATE: no available authentication methods succeeded,
failing!
3/8 11:56:32 DC_AUTHENTICATE: authenticate failed:
AUTHENTICATE:1003:Failed to authenticate with any
method|AUTHENTICATE:1004:Failed to authenticate using PASSWORD
3/8 11:56:32
Return from Handler <DaemonCore::HandleReqSocketHandler>
3/8 11:56:32
Calling Timer handler 9 (dc_touch_log_file)
3/8 11:56:32 Return from Timer
handler 9 (dc_touch_log_file)
3/8 11:57:13 Calling Handler
<DaemonCore::HandleReqSocketHandler>
3/8 11:57:13 Calling HandleReq
<store_cred_handler> (0)
3/8 11:57:13 Return from HandleReq
<store_cred_handler>
3/8 11:57:13 Return from Handler
<DaemonCore::HandleReqSocketHandler>
3/8 11:57:31 Calling Timer handler
3 (self_monitor)
3/8 11:57:31 Return from Timer handler 3
(self_monitor)
3/8 11:57:32 Calling Timer handler 11
(dc_touch_log_file)
3/8 11:57:32 Return from Timer handler 11
(dc_touch_log_file)
=================================================================
Does anyone know what the
problem could be?
cheers
steve