Hi,
When I submit the following submission file through condor it does not work and the job remains idle while submitting the same job using globus-job-submit works without any errors. The log on the remote host shows authentication failure in the condor-G case but it does not shows any failure when submitting the job by globus. Does any one come across this problem or know how to solve it? any help will be appreciated. I use condor 7.6.6 and VDT 2 Submission file and process: [zhrani@CM Grid]$ cat hostname_submit.jcl grid_resource = gt4 https://head.beng02.com:2119/wsrf/services/ManagedJobFactoryService PBS Universe = grid when_to_transfer_output = ON_EXIT Executable = /bin/hostname Arguments = -f Output = cout.$(Cluster).$(Process) Log =clog.$(Cluster).$(Process) Queue [zhrani@CM Grid]$ condor_submit hostname_submit.jcl Submitting job(s). Logging submit event(s). 1 job(s) submitted to cluster 1106. [zhrani@CM Grid]$ condor_q -globus -- Submitter: CM.CHPC.hud.ac.uk : <192.168.0.10:21871> : CM.CHPC.hud.ac.uk ID OWNER STATUS MANAGER HOST EXECUTABLE 1106.0 zhrani UNSUBMITTED PBS head.beng02.com /bin/hostname [zhrani@CM Grid]$ condor_rm zhrani User zhrani's job(s) have been marked for removal. [zhrani@CM Grid]$ globus-job-submit head.beng02.com /bin/hostname -f https://head.beng02.com:37308/6261/1335746926/ [zhrani@CM Grid]$ globus-job-status https://head.beng02.com:37308/6261/1335746926/ DONE [zhrani@CM Grid]$ globus-job-get-output https://head.beng02.com:37308/6261/1335746926/ head.beng02.com Gridmanager LOG: 04/30/12 01:46:29 [25065] resource https://head.beng02.com:2119/wsrf/services/ManagedJobFactoryService is now up 04/30/12 01:46:29 [25065] *** checkDelegation() 04/30/12 01:46:29 [25065] (1106.0) doEvaluateState called: gmState GM_UNSUBMITTED, globusState 04/30/12 01:47:19 [25065] Received CHECK_LEASES signal 04/30/12 01:47:19 [25065] in doContactSchedd() 04/30/12 01:47:19 [25065] querying for renewed leases 04/30/12 01:47:19 [25065] querying for removed/held jobs 04/30/12 01:47:19 [25065] Using constraint ((Owner=?="zhrani"&&JobUniverse==9)) && ((Managed =!= "ScheddDone")) && (JobStatus == 3 || JobStatus == 4 || (JobStatus == 5 && Managed =?= "External")) 04/30/12 01:47:19 [25065] Fetched 0 job ads from schedd 04/30/12 01:47:19 [25065] leaving doContactSchedd() 04/30/12 01:47:22 [25065] GridftpServer: Submitting job for proxy '/O=Grid/OU=GlobusTest/OU=simpleCA-head.beng02.com/OU=beng02.com/CN=zahrani' 04/30/12 01:47:22 [25065] entering FileTransfer::SimpleInit 04/30/12 01:47:22 [25065] Input files: /tmp/condor_g_scratch.0x19360fd0.25029/grid-mapfile 04/30/12 01:47:22 [25065] entering FileTransfer::UploadFiles (final_transfer=0) 04/30/12 01:47:22 [25065] entering FileTransfer::Upload 04/30/12 01:47:22 [25065] entering FileTransfer::DoUpload 04/30/12 01:47:22 [25065] DoUpload: sending file /tmp/condor_g_scratch.0x19360fd0.25029/master_proxy.2 04/30/12 01:47:22 [25065] FILETRANSFER: outgoing file_command is 4 for /tmp/condor_g_scratch.0x19360fd0.25029/master_proxy.2 04/30/12 01:47:22 [25065] Received GoAhead from peer to send /tmp/condor_g_scratch.0x19360fd0.25029/master_proxy.2 and all further files. 04/30/12 01:47:22 [25065] Sending GoAhead for 192.168.0.10 to receive /tmp/condor_g_scratch.0x19360fd0.25029/master_proxy.2 and all further files. 04/30/12 01:47:22 [25065] DoUpload: put_x509_delegation() returned 0 04/30/12 01:47:22 [25065] DoUpload: sending file /tmp/condor_g_scratch.0x19360fd0.25029/grid-mapfile 04/30/12 01:47:22 [25065] FILETRANSFER: outgoing file_command is 1 for /tmp/condor_g_scratch.0x19360fd0.25029/grid-mapfile 04/30/12 01:47:22 [25065] ReliSock::put_file_with_permissions(): going to send permissions 100644 04/30/12 01:47:22 [25065] put_file: going to send from filename /tmp/condor_g_scratch.0x19360fd0.25029/grid-mapfile 04/30/12 01:47:22 [25065] put_file: Found file size 84 04/30/12 01:47:22 [25065] put_file: sending 84 bytes 04/30/12 01:47:22 [25065] ReliSock: put_file: sent 84 bytes 04/30/12 01:47:22 [25065] DoUpload: sending file /usr/libexec/condor/gridftp_wrapper.sh 04/30/12 01:47:22 [25065] FILETRANSFER: outgoing file_command is 1 for /usr/libexec/condor/gridftp_wrapper.sh 04/30/12 01:47:22 [25065] ReliSock::put_file_with_permissions(): going to send permissions 100755 04/30/12 01:47:22 [25065] put_file: going to send from filename /usr/libexec/condor/gridftp_wrapper.sh 04/30/12 01:47:22 [25065] put_file: Found file size 1057 04/30/12 01:47:22 [25065] put_file: sending 1057 bytes 04/30/12 01:47:22 [25065] ReliSock: put_file: sent 1057 bytes 04/30/12 01:47:22 [25065] DoUpload: exiting at 3003 04/30/12 01:47:25 [25065] GAHP[25071] <- 'RESULTS' 04/30/12 01:47:25 [25065] GAHP[25071] -> 'S' '0' 04/30/12 01:47:25 [25065] in doContactSchedd() 04/30/12 01:47:25 [25065] querying for removed/held jobs 04/30/12 01:47:25 [25065] Using constraint ((Owner=?="zhrani"&&JobUniverse==9)) && ((Managed =!= "ScheddDone")) && (JobStatus == 3 || JobStatus == 4 || (JobStatus == 5 && Managed =?= "External")) 04/30/12 01:47:25 [25065] Fetched 0 job ads from schedd 04/30/12 01:47:25 [25065] 1108.0 job status: 4 04/30/12 01:47:25 [25065] leaving doContactSchedd() 04/30/12 01:47:26 [25065] Evaluating staleness of remote job statuses. 04/30/12 01:47:42 [25065] Received REMOVE_JOBS signal 04/30/12 01:47:42 [25065] in doContactSchedd() 04/30/12 01:47:42 [25065] querying for new jobs 04/30/12 01:47:42 [25065] Using constraint ((Owner=?="zhrani"&&JobUniverse==9)) && (Managed =!= "ScheddDone") && (Matched =!= FALSE) && (JobStatus != 5) && (Managed =!= "External") 04/30/12 01:47:42 [25065] Fetched 0 new job ads from schedd 04/30/12 01:47:42 [25065] querying for removed/held jobs 04/30/12 01:47:42 [25065] Using constraint ((Owner=?="zhrani"&&JobUniverse==9)) && ((Managed =!= "ScheddDone")) && (JobStatus == 3 || JobStatus == 4 || (JobStatus == 5 && Managed =?= "External")) 04/30/12 01:47:42 [25065] Fetched 1 job ads from schedd 04/30/12 01:47:42 [25065] leaving doContactSchedd() 04/30/12 01:47:42 [25065] (1106.0) doEvaluateState called: gmState GM_UNSUBMITTED, globusState 04/30/12 01:47:42 [25065] (1106.0) gm state change: GM_UNSUBMITTED -> GM_DELETE 04/30/12 01:47:42 [25065] directory_util::rec_touch_file: Creating directory /tmp 04/30/12 01:47:42 [25065] directory_util::rec_touch_file: Creating directory /tmp/condorLocks 04/30/12 01:47:42 [25065] directory_util::rec_touch_file: Creating directory /tmp/condorLocks/13 04/30/12 01:47:42 [25065] directory_util::rec_touch_file: Creating directory /tmp/condorLocks/13/73 04/30/12 01:47:42 [25065] FileLock object is updating timestamp on: /tmp/condorLocks/13/73/8341789162039746.lockc 04/30/12 01:47:42 [25065] (1106.0) Writing abort record to user logfile 04/30/12 01:47:42 [25065] FileLock::obtain(1) - @1335746862.880224 lock on /tmp/condorLocks/13/73/8341789162039746.lockc now WRITE 04/30/12 01:47:42 [25065] FileLock::obtain(2) - @1335746862.882102 lock on /tmp/condorLocks/13/73/8341789162039746.lockc now UNLOCKED 04/30/12 01:47:42 [25065] FileLock::obtain(1) - @1335746862.882247 lock on /tmp/condorLocks/13/73/8341789162039746.lockc now WRITE 04/30/12 01:47:42 [25065] directory_util::rec_clean_up: file /tmp/condorLocks/13/73/8341789162039746.lockc has been deleted. 04/30/12 01:47:42 [25065] Lock file /tmp/condorLocks/13/73/8341789162039746.lockc has been deleted. 04/30/12 01:47:42 [25065] FileLock::obtain(2) - @1335746862.882583 lock on /tmp/condorLocks/13/73/8341789162039746.lockc now UNLOCKED 04/30/12 01:47:47 [25065] in doContactSchedd() 04/30/12 01:47:47 [25065] querying for removed/held jobs 04/30/12 01:47:47 [25065] Using constraint ((Owner=?="zhrani"&&JobUniverse==9)) && ((Managed =!= "ScheddDone")) && (JobStatus == 3 || JobStatus == 4 || (JobStatus == 5 && Managed =?= "External")) 04/30/12 01:47:47 [25065] Fetched 1 job ads from schedd 04/30/12 01:47:47 [25065] Updating classad values for 1106.0: 04/30/12 01:47:47 [25065] Managed = "ScheddDone" 04/30/12 01:47:47 [25065] Deleting job 1106.0 from schedd 04/30/12 01:47:47 [25065] GAHP[25071] <- 'UNCACHE_PROXY 1' 04/30/12 01:47:47 [25065] GAHP[25071] -> 'S' 04/30/12 01:47:47 [25065] No jobs left, shutting down 04/30/12 01:47:47 [25065] leaving doContactSchedd() 04/30/12 01:47:47 [25065] Got SIGTERM. Performing graceful shutdown. 04/30/12 01:47:47 [25065] Started timer to call main_shutdown_fast in 1800 seconds 04/30/12 01:47:47 [25065] **** condor_gridmanager (condor_GRIDMANAGER) pid 25065 EXITING WITH STATUS 0 Remote Host Log including condor-G submit and globus submit: TIME: Mon Apr 30 01:46:26 2012 PID: 6255 -- Notice: 6: globus-gatekeeper pid=6255 starting at Mon Apr 30 01:46:26 2012 TIME: Mon Apr 30 01:46:26 2012 PID: 6255 -- Notice: 6: Got connection 10.71.88.93 at Mon Apr 30 01:46:26 2012 GSS authentication failure GSS Major Status: General failure GSS Minor Status Error Chain: globus_gsi_gssapi: Error during delegation: Delegation protocol violation Failure: GSS failed Major:000d0000 Minor:00000002 Token:00000000 TIME: Mon Apr 30 01:46:26 2012 PID: 6255 -- Failure: GSS failed Major:000d0000 Minor:00000002 Token:00000000 TIME: Mon Apr 30 01:48:46 2012 PID: 6260 -- Notice: 6: globus-gatekeeper pid=6260 starting at Mon Apr 30 01:48:46 2012 TIME: Mon Apr 30 01:48:46 2012 PID: 6260 -- Notice: 6: Got connection 10.71.88.93 at Mon Apr 30 01:48:46 2012 TIME: Mon Apr 30 01:48:46 2012 PID: 6260 -- Notice: 5: Authenticated globus user: /O=Grid/OU=GlobusTest/OU=simpleCA-head.beng02.com/OU=beng02.com/CN=zahrani TIME: Mon Apr 30 01:48:46 2012 PID: 6260 -- Notice: 0: GRID_SECURITY_HTTP_BODY_FD=6 TIME: Mon Apr 30 01:48:46 2012 PID: 6260 -- Notice: 5: Requested service: jobmanager TIME: Mon Apr 30 01:48:46 2012 PID: 6260 -- Notice: 5: Authorized as local user: zhrani TIME: Mon Apr 30 01:48:46 2012 PID: 6260 -- Notice: 5: Authorized as local uid: 516 TIME: Mon Apr 30 01:48:46 2012 PID: 6260 -- Notice: 5: and local gid: 516 TIME: Mon Apr 30 01:48:46 2012 PID: 6260 -- Notice: 0: executing /usr/local/globus-4.2.0/libexec/globus-job-manager TIME: Mon Apr 30 01:48:46 2012 PID: 6260 -- Notice: 0: GRID_SECURITY_CONTEXT_FD=9 TIME: Mon Apr 30 01:48:46 2012 PID: 6260 -- Notice: 0: Child 6261 started TIME: Mon Apr 30 01:49:21 2012 PID: 6275 -- Notice: 6: globus-gatekeeper pid=6275 starting at Mon Apr 30 01:49:21 2012 TIME: Mon Apr 30 01:49:21 2012 PID: 6275 -- Notice: 6: Got connection 10.71.88.93 at Mon Apr 30 01:49:21 2012 TIME: Mon Apr 30 01:49:21 2012 PID: 6275 -- Notice: 5: Authenticated globus user: /O=Grid/OU=GlobusTest/OU=simpleCA-head.beng02.com/OU=beng02.com/CN=zahrani TIME: Mon Apr 30 01:49:21 2012 PID: 6275 -- Notice: 0: GRID_SECURITY_HTTP_BODY_FD=6 TIME: Mon Apr 30 01:49:21 2012 PID: 6275 -- Notice: 5: Requested service: jobmanager TIME: Mon Apr 30 01:49:21 2012 PID: 6275 -- Notice: 5: Authorized as local user: zhrani TIME: Mon Apr 30 01:49:21 2012 PID: 6275 -- Notice: 5: Authorized as local uid: 516 TIME: Mon Apr 30 01:49:21 2012 PID: 6275 -- Notice: 5: and local gid: 516 TIME: Mon Apr 30 01:49:21 2012 PID: 6275 -- Notice: 0: executing /usr/local/globus-4.2.0/libexec/globus-job-manager TIME: Mon Apr 30 01:49:21 2012 PID: 6275 -- Notice: 0: GRID_SECURITY_CONTEXT_FD=9 TIME: Mon Apr 30 01:49:21 2012 PID: 6275 -- Notice: 0: Child 6276 started Regards, |