[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] credd issues: heterogenous system MAC-central; WIN-execute + EC2 (win) when this works



hi-

Here are the machines i'm setting up:

1) Mac (intel osx) - as condor central server
2) paralles VM running Windows within the mac as execute machine
3) seperate windows desktop
4) after everthing else works: EC2 windows machines - i suppose running as a cluster that attachs as a flock. (perhaps with cyclecomputing)

I have tried (for days):
* playing with various configurations of condor_config & condor_config.local on both machines.
* taken down firewalls on both sides.
* read manuals, googled, etc..
* running condor_store_cred with various setting on both sides

STATUS:
So far I have Condor up and running on the MAC as an execute, submit, manage installation. I successfully ran a test job. The windows execute node is up but i can't test it until i get credd security working properly (i think that's the problem). I can see the windows and mac slots from the both sides (see below). 

When i submit a job from MAC that has windows requirements it doesn't run. Presently, condor_q -analyze says "not yet been considered by the matchmaker" and "match but reject the job for unknown reasons." Under a previously attempted configuration it was "reject your job because of their own requirements" , the Windows slot would got to 'Matched', but the job would be Idle and the logs would suggest a security issue.

I can't even condor_rm the Idle jobs on the MAC side. I'm guessing there being matched to Windows ceded their control:
------
jimi:~ root# condor_q


-- Submitter: jimi.westell.com : <169.254.177.117:49371> : jimi.westell.com
ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               
 11.0   Jason           8/17 22:10   0+01:46:05 I  0   0.0  sample-job 60     
 13.0   Jason           8/18 01:12   0+01:24:43 I  0   0.0  sample-job 60     
 14.0   Jason           8/18 01:24   0+00:02:49 I  0   0.0  sample-job 60     
 15.0   Jason           8/18 01:53   0+00:00:00 I  0   0.0  sample-job 60     

4 jobs; 4 idle, 0 running, 0 held

jimi:~ root# condor_rm 11.0
AUTHENTICATE:1003:Failed to authenticate with any method
No result found for job 11.0
------


CONFIGURATIONS:


-------- condor_config.local on MAC:
--------
  CREDD_HOST = 10.211.55.10
  STARTER_ALLOW_RUNAS_OWNER = True
  CREDD_CACHE_LOCALLY = True
  ALLOW_CONFIG = root@$(CONDOR_HOST), *
  SEC_CONFIG_NEGOTIATION = REQUIRED
  SEC_CONFIG_AUTHENTICATION = REQUIRED
  SEC_CONFIG_ENCRYPTION = REQUIRED
  SEC_CONFIG_INTEGRITY = REQUIRED
  SEC_PASSWORD_FILE = /usr/local/condor/etc/pool_password

-------- condor_config.local on Windows:
--------
CREDD_HOST = xx.xxx.55.10
  STARTER_ALLOW_RUNAS_OWNER = True
  CREDD_CACHE_LOCALLY = True
  SEC_CLIENT_AUTHENTICATION_METHODS = NTSSPI, PASSWORD
  ALLOW_CONFIG = *
  SEC_CONFIG_NEGOTIATION = REQUIRED
  SEC_CONFIG_AUTHENTICATION = REQUIRED
  SEC_CONFIG_ENCRYPTION = REQUIRED
  SEC_CONFIG_INTEGRITY = REQUIRED

------- condor_config on Windows
------- i made this low security just try to get it working:
-------
ALLOW_WRITE = *
ALLOW_READ = *
#... not sure what else you need to see


LOG FILES:

--------- CredLog - on windows
--------- this is after turning MAC & WIN firewalls off - not a perm solution, but not working anyway:
---------
08/18/11 14:42:18 Failed to start non-blocking update to <xxx.xxx.1.21:9618>.
08/18/11 14:42:18 Return from Handler <SecManStartCommand::WaitForSocketCallback UPDATE_AD_GENERIC> 0.0000s
08/18/11 14:47:18 Calling Handler <SecManStartCommand::WaitForSocketCallback UPDATE_AD_GENERIC> (2)
08/18/11 14:47:18 Return from Handler <SecManStartCommand::WaitForSocketCallback UPDATE_AD_GENERIC> 0.0000s
08/18/11 14:47:18 Calling Handler <SecManStartCommand::WaitForSocketCallback UPDATE_AD_GENERIC> (2)
08/18/11 14:47:18 SECMAN: required authentication with <xxx.xxx.1.21:9618> failed, so aborting command UPDATE_AD_GENERIC.
08/18/11 14:47:18 ERROR: SECMAN:2004:Failed to create security session to <xxx.xxx.1.21:9618> with TCP.
|AUTHENTICATE:1003:Failed to authenticate with any method
08/18/11 14:47:18 Failed to start non-blocking update to <xxx.xxx.1.21:9618>.
08/18/11 14:47:18 Return from Handler <SecManStartCommand::WaitForSocketCallback UPDATE_AD_GENERIC> 0.0000s
08/18/11 14:52:39 attempt to connect to <xxx.xxx.1.21:9618> failed: timed out after 20 seconds.
08/18/11 14:52:39 Calling Handler <SecManStartCommand::WaitForSocketCallback UPDATE_AD_GENERIC> (2)
08/18/11 14:52:39 ERROR: SECMAN:2004:Failed to create security session to <xxx.xxx.1.21:9618> with TCP.
|SECMAN:2003:TCP connection to <xxx.xxx.1.21:9618> failed.
08/18/11 14:52:39 Failed to start non-blocking update to <xxx.xxx.1.21:9618>.
08/18/11 14:52:39 Return from Handler <SecManStartCommand::WaitForSocketCallback UPDATE_AD_GENERIC> 0.0000s

--------- MasterLog - on windows
---------
---------
08/18/11 14:51:50 condor_read(): timeout reading 21 bytes from <10.211.55.10:53043>.
08/18/11 14:51:50 IO: Failed to read packet header
08/18/11 14:51:50 store_pool_cred: failed to receive all parameters


COMMAND LINE OUTPUT:

---------- condor_status - on windows
---------- Manual says to run this when you are done, doesn't mention the command 
---------- only works on the windows side:
C:\Users\Administrator>condor_status -f "%s\t" Name -f "%s\n" ifThenElse(isUndefined(LocalCredd),\"UNDEF"\",LocalCredd)
slot1@JASONHERMANB752   UNDEF
slot1@xxxxxxxxxxxxxxxx  UNDEF
slot2@JASONHERMANB752   UNDEF
slot2@xxxxxxxxxxxxxxxx  UNDEF
slot3@xxxxxxxxxxxxxxxx  UNDEF
slot4@xxxxxxxxxxxxxxxx  UNDEF
slot5@xxxxxxxxxxxxxxxx  UNDEF
slot6@xxxxxxxxxxxxxxxx  UNDEF
slot7@xxxxxxxxxxxxxxxx  UNDEF
slot8@xxxxxxxxxxxxxxxx  UNDEF


------- condor_status - MAC (identical on windows)
-------
-------
jimi:log root# condor_status

Name               OpSys      Arch   State     Activity LoadAv Mem   ActvtyTime

slot1@xxxxxxxxxxxx OSX        X86_64 Unclaimed Idle     0.210  1024  0+19:09:01
slot2@xxxxxxxxxxxx OSX        X86_64 Unclaimed Idle     0.000  1024  1+11:24:12
slot3@xxxxxxxxxxxx OSX        X86_64 Unclaimed Idle     0.000  1024  1+03:18:37
slot4@xxxxxxxxxxxx OSX        X86_64 Unclaimed Idle     0.000  1024  0+23:14:03
slot5@xxxxxxxxxxxx OSX        X86_64 Unclaimed Idle     0.000  1024  0+15:05:52
slot6@xxxxxxxxxxxx OSX        X86_64 Unclaimed Idle     0.000  1024  0+11:04:54
slot7@xxxxxxxxxxxx OSX        X86_64 Unclaimed Idle     0.000  1024  0+06:59:54
slot8@xxxxxxxxxxxx OSX        X86_64 Unclaimed Idle     0.000  1024  1+15:27:42
slot1@JASONHERMANB WINNT60    INTEL  Unclaimed Idle     0.120  1023  0+00:00:04
slot2@JASONHERMANB WINNT60    INTEL  Unclaimed Idle     0.100  1023  0+00:00:02
                    Total Owner Claimed Unclaimed Matched Preempting Backfill

      INTEL/WINNT60     2     0       0         2       0          0        0
         X86_64/OSX     8     0       0         8       0          0        0

              Total    10     0       0        10       0          0        0


-------- condor_store_cred on Windows:
--------
--------
C:\Users\Administrator>condor_store_cred -c add
Account: condor_pool@JASONHERMANB752

Enter password:

Operation failed.
   Make sure you have CONFIG access to the target Master.


thanks kindly for any assistance, jason