Here are the machines i'm setting up:
1) Mac (intel osx) - as condor central server
2) paralles VM running Windows within the mac as execute machine
3) seperate windows desktop
4) after everthing else works: EC2 windows machines - i suppose running as a cluster that attachs as a flock. (perhaps with cyclecomputing)
I have tried (for days):
* playing with various configurations of condor_config & condor_config.local on both machines.
* taken down firewalls on both sides.
* read manuals, googled, etc..
* running condor_store_cred with various setting on both sides
STATUS:
So far I have Condor up and running on the MAC as an execute, submit, manage installation. I successfully ran a test job. The windows execute node is up but i can't test it until i get credd security working properly (i think that's the problem). I can see the windows and mac slots from the both sides (see below).
When i submit a job from MAC that has windows requirements it doesn't run. Presently, condor_q -analyze says "not yet been considered by the matchmaker" and "match but reject the job for unknown reasons." Under a previously attempted configuration it was "reject your job because of their own requirements" , the Windows slot would got to 'Matched', but the job would be Idle and the logs would suggest a security issue.
I can't even condor_rm the Idle jobs on the MAC side. I'm guessing there being matched to Windows ceded their control:
------
jimi:~ root# condor_q
-- Submitter:
jimi.westell.com : <
169.254.177.117:49371> :
jimi.westell.com
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
11.0 Jason 8/17 22:10 0+01:46:05 I 0 0.0 sample-job 60
13.0 Jason 8/18 01:12 0+01:24:43 I 0 0.0 sample-job 60
14.0 Jason 8/18 01:24 0+00:02:49 I 0 0.0 sample-job 60
15.0 Jason 8/18 01:53 0+00:00:00 I 0 0.0 sample-job 60
4 jobs; 4 idle, 0 running, 0 held
jimi:~ root# condor_rm 11.0
AUTHENTICATE:1003:Failed to authenticate with any method
No result found for job 11.0
------
CONFIGURATIONS:
-------- condor_config.local on MAC:
--------
CREDD_HOST = 10.211.55.10
STARTER_ALLOW_RUNAS_OWNER = True
CREDD_CACHE_LOCALLY = True
ALLOW_CONFIG = root@$(CONDOR_HOST), *
SEC_CONFIG_NEGOTIATION = REQUIRED
SEC_CONFIG_AUTHENTICATION = REQUIRED
SEC_CONFIG_ENCRYPTION = REQUIRED
SEC_CONFIG_INTEGRITY = REQUIRED
SEC_PASSWORD_FILE = /usr/local/condor/etc/pool_password
-------- condor_config.local on Windows:
--------
CREDD_HOST = xx.xxx.55.10
STARTER_ALLOW_RUNAS_OWNER = True
CREDD_CACHE_LOCALLY = True
SEC_CLIENT_AUTHENTICATION_METHODS = NTSSPI, PASSWORD
ALLOW_CONFIG = *
SEC_CONFIG_NEGOTIATION = REQUIRED
SEC_CONFIG_AUTHENTICATION = REQUIRED
SEC_CONFIG_ENCRYPTION = REQUIRED
SEC_CONFIG_INTEGRITY = REQUIRED
------- condor_config on Windows
------- i made this low security just try to get it working:
-------
ALLOW_WRITE = *
ALLOW_READ = *
#... not sure what else you need to see
LOG FILES:
--------- CredLog - on windows
--------- this is after turning MAC & WIN firewalls off - not a perm solution, but not working anyway:
---------
08/18/11 14:42:18 Failed to start non-blocking update to <xxx.xxx.1.21:9618>.
08/18/11 14:42:18 Return from Handler <SecManStartCommand::WaitForSocketCallback UPDATE_AD_GENERIC> 0.0000s
08/18/11 14:47:18 Calling Handler <SecManStartCommand::WaitForSocketCallback UPDATE_AD_GENERIC> (2)
08/18/11 14:47:18 Return from Handler <SecManStartCommand::WaitForSocketCallback UPDATE_AD_GENERIC> 0.0000s
08/18/11 14:47:18 Calling Handler <SecManStartCommand::WaitForSocketCallback UPDATE_AD_GENERIC> (2)
08/18/11 14:47:18 SECMAN: required authentication with <xxx.xxx.1.21:9618> failed, so aborting command UPDATE_AD_GENERIC.
08/18/11 14:47:18 ERROR: SECMAN:2004:Failed to create security session to <xxx.xxx.1.21:9618> with TCP.
|AUTHENTICATE:1003:Failed to authenticate with any method
08/18/11 14:47:18 Failed to start non-blocking update to <xxx.xxx.1.21:9618>.
08/18/11 14:47:18 Return from Handler <SecManStartCommand::WaitForSocketCallback UPDATE_AD_GENERIC> 0.0000s
08/18/11 14:52:39 attempt to connect to <xxx.xxx.1.21:9618> failed: timed out after 20 seconds.
08/18/11 14:52:39 Calling Handler <SecManStartCommand::WaitForSocketCallback UPDATE_AD_GENERIC> (2)
08/18/11 14:52:39 ERROR: SECMAN:2004:Failed to create security session to <xxx.xxx.1.21:9618> with TCP.
|SECMAN:2003:TCP connection to <xxx.xxx.1.21:9618> failed.
08/18/11 14:52:39 Failed to start non-blocking update to <xxx.xxx.1.21:9618>.
08/18/11 14:52:39 Return from Handler <SecManStartCommand::WaitForSocketCallback UPDATE_AD_GENERIC> 0.0000s
--------- MasterLog - on windows
---------
---------
08/18/11 14:51:50 condor_read(): timeout reading 21 bytes from <
10.211.55.10:53043>.
08/18/11 14:51:50 IO: Failed to read packet header
08/18/11 14:51:50 store_pool_cred: failed to receive all parameters
COMMAND LINE OUTPUT:
---------- condor_status - on windows
---------- Manual says to run this when you are done, doesn't mention the command
---------- only works on the windows side:
C:\Users\Administrator>condor_status -f "%s\t" Name -f "%s\n" ifThenElse(isUndefined(LocalCredd),\"UNDEF"\",LocalCredd)
slot1@JASONHERMANB752 UNDEF
slot1@xxxxxxxxxxxxxxxx UNDEF
slot2@JASONHERMANB752 UNDEF
slot2@xxxxxxxxxxxxxxxx UNDEF
slot3@xxxxxxxxxxxxxxxx UNDEF
slot4@xxxxxxxxxxxxxxxx UNDEF
slot5@xxxxxxxxxxxxxxxx UNDEF
slot6@xxxxxxxxxxxxxxxx UNDEF
slot7@xxxxxxxxxxxxxxxx UNDEF
slot8@xxxxxxxxxxxxxxxx UNDEF
------- condor_status - MAC (identical on windows)
-------
-------
jimi:log root# condor_status
Name OpSys Arch State Activity LoadAv Mem ActvtyTime
slot1@xxxxxxxxxxxx OSX X86_64 Unclaimed Idle 0.210 1024 0+19:09:01
slot2@xxxxxxxxxxxx OSX X86_64 Unclaimed Idle 0.000 1024 1+11:24:12
slot3@xxxxxxxxxxxx OSX X86_64 Unclaimed Idle 0.000 1024 1+03:18:37
slot4@xxxxxxxxxxxx OSX X86_64 Unclaimed Idle 0.000 1024 0+23:14:03
slot5@xxxxxxxxxxxx OSX X86_64 Unclaimed Idle 0.000 1024 0+15:05:52
slot6@xxxxxxxxxxxx OSX X86_64 Unclaimed Idle 0.000 1024 0+11:04:54
slot7@xxxxxxxxxxxx OSX X86_64 Unclaimed Idle 0.000 1024 0+06:59:54
slot8@xxxxxxxxxxxx OSX X86_64 Unclaimed Idle 0.000 1024 1+15:27:42
slot1@JASONHERMANB WINNT60 INTEL Unclaimed Idle 0.120 1023 0+00:00:04
slot2@JASONHERMANB WINNT60 INTEL Unclaimed Idle 0.100 1023 0+00:00:02
Total Owner Claimed Unclaimed Matched Preempting Backfill
INTEL/WINNT60 2 0 0 2 0 0 0
X86_64/OSX 8 0 0 8 0 0 0
Total 10 0 0 10 0 0 0
-------- condor_store_cred on Windows:
--------
--------
C:\Users\Administrator>condor_store_cred -c add
Account: condor_pool@JASONHERMANB752
Enter password:
Operation failed.
Make sure you have CONFIG access to the target Master.
thanks kindly for any assistance, jason