When I submit a condor-G job its status keeps "idle" when I type "condor_q" and "PENDING" when I type "condor_q -globus". Is there a missing configuration that I need to add to be able to submit condor-G jobs successfully?
I use Condor 7.6.6 and VDT 2 Submission file and process: [zhrani@CM Grid]$ cat hostname_submit.jcl grid_resource = gt2 head.beng02.com/jobmanager-pbs Universe = grid when_to_transfer_output = ON_EXIT Executable = /bin/hostname Arguments = -f Output = cout.$(Cluster).$(Process) Log =clog.$(Cluster).$(Process) Queue [zhrani@CM Grid]$ condor_submit hostname_submit.jcl Submitting job(s). Logging submit event(s). 1 job(s) submitted to cluster 1111. [zhrani@CM Grid]$ condor_q -- Submitter: CM.CHPC.hud.ac.uk : <192.168.0.10:21871> : CM.CHPC.hud.ac.uk ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 1111.0 zhrani 4/30 11:07 0+00:00:00 I 0 0.0 hostname -f 1 jobs; 1 idle, 0 running, 0 held [zhrani@CM Grid]$ condor_q -globus -- Submitter: CM.CHPC.hud.ac.uk : <192.168.0.10:21871> : CM.CHPC.hud.ac.uk ID OWNER STATUS MANAGER HOST EXECUTABLE 1111.0 zhrani PENDING pbs head.beng02.com /bin/hostname [zhrani@CM Grid]$ cat clog.1111.0 000 (1111.000.000) 04/30 11:07:24 Job submitted from host: <192.168.0.10:21871> ... 017 (1111.000.000) 04/30 11:07:34 Job submitted to Globus RM-Contact: head.beng02.com/jobmanager-pbs JM-Contact: https://head.beng02.com:53994/13404/1335780447/ Can-Restart-JM: 1 ... 027 (1111.000.000) 04/30 11:07:34 Job submitted to grid resource GridResource: gt2 head.beng02.com/jobmanager-pbs GridJobId: gt2 head.beng02.com/jobmanager-pbs https://head.beng02.com:53994/13404/1335780447/ ... Gridmanager LOG: 04/30/12 11:07:34 [31322] GAHP[31326] <- 'RESULTS' 04/30/12 11:07:34 [31322] GAHP[31326] -> 'R' 04/30/12 11:07:34 [31322] GAHP[31326] -> 'S' '1' 04/30/12 11:07:34 [31322] GAHP[31326] -> '2' 'https://head.beng02.com:53994/13404/1335780447/' '64' '0' 04/30/12 11:07:34 [31322] (1111.0) gram callback: state 64, errorcode 0 04/30/12 11:07:34 [31322] (1111.0) doEvaluateState called: gmState GM_SUBMITTED, globusState 32 04/30/12 11:07:34 [31322] (1111.0) globus state change: UNSUBMITTED -> STAGE_IN 04/30/12 11:07:34 [31322] directory_util::rec_touch_file: Creating directory /tmp 04/30/12 11:07:34 [31322] directory_util::rec_touch_file: Creating directory /tmp/condorLocks 04/30/12 11:07:34 [31322] directory_util::rec_touch_file: Creating directory /tmp/condorLocks/13 04/30/12 11:07:34 [31322] directory_util::rec_touch_file: Creating directory /tmp/condorLocks/13/73 04/30/12 11:07:34 [31322] FileLock object is updating timestamp on: /tmp/condorLocks/13/73/8624055152012540.lockc 04/30/12 11:07:34 [31322] (1111.0) Writing globus submit record to user logfile 04/30/12 11:07:34 [31322] FileLock::obtain(1) - @1335780454.150935 lock on /tmp/condorLocks/13/73/8624055152012540.lockc now WRITE 04/30/12 11:07:34 [31322] FileLock::obtain(2) - @1335780454.154117 lock on /tmp/condorLocks/13/73/8624055152012540.lockc now UNLOCKED 04/30/12 11:07:34 [31322] FileLock::obtain(1) - @1335780454.154250 lock on /tmp/condorLocks/13/73/8624055152012540.lockc now WRITE 04/30/12 11:07:34 [31322] directory_util::rec_clean_up: file /tmp/condorLocks/13/73/8624055152012540.lockc has been deleted. 04/30/12 11:07:34 [31322] Lock file /tmp/condorLocks/13/73/8624055152012540.lockc has been deleted. 04/30/12 11:07:34 [31322] FileLock::obtain(2) - @1335780454.154591 lock on /tmp/condorLocks/13/73/8624055152012540.lockc now UNLOCKED 04/30/12 11:07:34 [31322] directory_util::rec_touch_file: Creating directory /tmp 04/30/12 11:07:34 [31322] directory_util::rec_touch_file: Creating directory /tmp/condorLocks 04/30/12 11:07:34 [31322] directory_util::rec_touch_file: Creating directory /tmp/condorLocks/13 04/30/12 11:07:34 [31322] directory_util::rec_touch_file: Creating directory /tmp/condorLocks/13/73 04/30/12 11:07:34 [31322] FileLock object is updating timestamp on: /tmp/condorLocks/13/73/8624055152012540.lockc 04/30/12 11:07:34 [31322] (1111.0) Writing grid submit record to user logfile 04/30/12 11:07:34 [31322] FileLock::obtain(1) - @1335780454.155638 lock on /tmp/condorLocks/13/73/8624055152012540.lockc now WRITE 04/30/12 11:07:34 [31322] FileLock::obtain(2) - @1335780454.157136 lock on /tmp/condorLocks/13/73/8624055152012540.lockc now UNLOCKED 04/30/12 11:07:34 [31322] FileLock::obtain(1) - @1335780454.157265 lock on /tmp/condorLocks/13/73/8624055152012540.lockc now WRITE 04/30/12 11:07:34 [31322] directory_util::rec_clean_up: file /tmp/condorLocks/13/73/8624055152012540.lockc has been deleted. 04/30/12 11:07:34 [31322] Lock file /tmp/condorLocks/13/73/8624055152012540.lockc has been deleted. 04/30/12 11:07:34 [31322] FileLock::obtain(2) - @1335780454.157598 lock on /tmp/condorLocks/13/73/8624055152012540.lockc now UNLOCKED 04/30/12 11:07:34 [31322] GAHP[31326] <- 'RESULTS' 04/30/12 11:07:34 [31322] GAHP[31326] -> 'R' 04/30/12 11:07:34 [31322] GAHP[31326] -> 'S' '1' 04/30/12 11:07:34 [31322] GAHP[31326] -> '2' 'https://head.beng02.com:53994/13404/1335780447/' '1' '0' 04/30/12 11:07:34 [31322] (1111.0) gram callback: state 1, errorcode 0 04/30/12 11:07:34 [31322] (1111.0) doEvaluateState called: gmState GM_SUBMITTED, globusState 64 04/30/12 11:07:34 [31322] (1111.0) globus state change: STAGE_IN -> PENDING 04/30/12 11:07:38 [31322] grid_monitor for head.beng02.com:2119 entering CheckMonitor 04/30/12 11:07:38 [31322] GAHP[31326] <- 'GRAM_JOB_REQUEST 7 head.beng02.com:2119/jobmanager-fork https://cm.chpc.hud.ac.uk:24383/ 1 &(executable=https://cm.chpc.hud.ac.uk:20886/usr/sbin/grid_monitor.sh)(stdout=https://cm.chpc.hud.ac.uk:20886/tmp/condor_g_scratch.0x19390fd0.25029/grid-monitor.head.beng02.com:2119.1/grid-monitor-log)(arguments='--dest-url=""> 04/30/12 11:07:38 [31322] GAHP[31326] -> 'S' 04/30/12 11:07:39 [31322] in doContactSchedd() 04/30/12 11:07:39 [31322] querying for removed/held jobs 04/30/12 11:07:39 [31322] Using constraint ((Owner=?="zhrani"&&JobUniverse==9)) && ((Managed =!= "ScheddDone")) && (JobStatus == 3 || JobStatus == 4 || (JobStatus == 5 && Managed =?= "External")) 04/30/12 11:07:39 [31322] Fetched 0 job ads from schedd 04/30/12 11:07:39 [31322] Updating classad values for 1111.0: 04/30/12 11:07:39 [31322] GlobusStatus = 1 04/30/12 11:07:39 [31322] GridJobStatus = "PENDING" 04/30/12 11:07:39 [31322] LastRemoteStatusUpdate = 1335780454 04/30/12 11:07:39 [31322] NumGlobusSubmits = 1 04/30/12 11:07:39 [31322] leaving doContactSchedd() 04/30/12 11:07:42 [31322] GAHP[31326] <- 'RESULTS' 04/30/12 11:07:42 [31322] GAHP[31326] -> 'R' 04/30/12 11:07:42 [31322] GAHP[31326] -> 'S' '1' 04/30/12 11:07:42 [31322] GAHP[31326] -> '7' '0' 'https://head.beng02.com:60336/13434/1335780456/' 04/30/12 11:07:42 [31322] grid_monitor for head.beng02.com:2119 entering CheckMonitor 04/30/12 11:07:42 [31322] GAHP[31326] <- 'RESULTS' 04/30/12 11:07:42 [31322] GAHP[31326] -> 'R' 04/30/12 11:07:42 [31322] GAHP[31326] -> 'S' '1' 04/30/12 11:07:42 [31322] GAHP[31326] -> '2' 'https://head.beng02.com:60336/13434/1335780456/' '64' '0' 04/30/12 11:07:42 [31322] grid_monitor for head.beng02.com:2119: gram callback status=64 errorcode=0 04/30/12 11:07:43 [31322] GAHP[31326] <- 'RESULTS' 04/30/12 11:07:43 [31322] GAHP[31326] -> 'R' 04/30/12 11:07:43 [31322] GAHP[31326] -> 'S' '1' 04/30/12 11:07:43 [31322] GAHP[31326] -> '2' 'https://head.beng02.com:60336/13434/1335780456/' '2' '0' 04/30/12 11:07:43 [31322] grid_monitor for head.beng02.com:2119: gram callback status=2 errorcode=0 04/30/12 11:08:12 [31322] grid_monitor for head.beng02.com:2119 entering CheckMonitor 04/30/12 11:08:12 [31322] grid_monitor job status for head.beng02.com:2119 file has been refreshed. 04/30/12 11:08:12 [31322] Read full grid_monitor status file for head.beng02.com:2119: scan start=1335780406, scan finish=1335780406, job count=0 04/30/12 11:08:12 [31322] Read grid_monitor status file for head.beng02.com:2119 successfully 04/30/12 11:08:12 [31322] grid_monitor log file for head.beng02.com:2119 updated. 04/30/12 11:08:12 [31322] grid_monitor log file for head.beng02.com:2119 looks normal 04/30/12 11:08:12 [31322] Successfully started grid_monitor for head.beng02.com:2119 04/30/12 11:08:12 [31322] (1111.0) doEvaluateState called: gmState GM_SUBMITTED, globusState 1 04/30/12 11:08:12 [31322] (1111.0) gm state change: GM_SUBMITTED -> GM_PUT_TO_SLEEP 04/30/12 11:08:12 [31322] GAHP[31326] <- 'GRAM_JOB_SIGNAL 8 https://head.beng02.com:53994/13404/1335780447/ 9 NULL' 04/30/12 11:08:12 [31322] GAHP[31326] -> 'S' 04/30/12 11:08:12 [31322] GAHP[31326] <- 'RESULTS' 04/30/12 11:08:12 [31322] GAHP[31326] -> 'R' 04/30/12 11:08:12 [31322] GAHP[31326] -> 'S' '1' 04/30/12 11:08:12 [31322] GAHP[31326] -> '8' '0' '0' '1' 04/30/12 11:08:12 [31322] (1111.0) doEvaluateState called: gmState GM_PUT_TO_SLEEP, globusState 1 04/30/12 11:08:12 [31322] (1111.0) gm state change: GM_PUT_TO_SLEEP -> GM_JOBMANAGER_ASLEEP 04/30/12 11:08:12 [31322] GAHP[31326] <- 'RESULTS' 04/30/12 11:08:12 [31322] GAHP[31326] -> 'R' 04/30/12 11:08:12 [31322] GAHP[31326] -> 'S' '1' 04/30/12 11:08:12 [31322] GAHP[31326] -> '2' 'https://head.beng02.com:53994/13404/1335780447/' '4' '130' 04/30/12 11:08:12 [31322] (1111.0) gram callback: state 4, errorcode 130 04/30/12 11:08:12 [31322] (1111.0) doEvaluateState called: gmState GM_JOBMANAGER_ASLEEP, globusState 1 04/30/12 11:08:25 [31322] Received CHECK_LEASES signal 04/30/12 11:08:25 [31322] in doContactSchedd() 04/30/12 11:08:25 [31322] querying for renewed leases 04/30/12 11:08:25 [31322] querying for removed/held jobs 04/30/12 11:08:25 [31322] Using constraint ((Owner=?="zhrani"&&JobUniverse==9)) && ((Managed =!= "ScheddDone")) && (JobStatus == 3 || JobStatus == 4 || (JobStatus == 5 && Managed =?= "External")) 04/30/12 11:08:25 [31322] Fetched 0 job ads from schedd 04/30/12 11:08:25 [31322] leaving doContactSchedd() 04/30/12 11:08:28 [31322] GAHP[31326] <- 'RESULTS' 04/30/12 11:08:28 [31322] GAHP[31326] -> 'S' '0' 04/30/12 11:08:30 [31322] Evaluating staleness of remote job statuses. 04/30/12 11:08:42 [31322] grid_monitor for head.beng02.com:2119 entering CheckMonitor 04/30/12 11:09:12 [31322] grid_monitor for head.beng02.com:2119 entering CheckMonitor 04/30/12 11:09:12 [31322] grid_monitor job status for head.beng02.com:2119 file has been refreshed. 04/30/12 11:09:12 [31322] Read full grid_monitor status file for head.beng02.com:2119: scan start=1335780466, scan finish=1335780466, job count=1 04/30/12 11:09:12 [31322] Read grid_monitor status file for head.beng02.com:2119 successfully 04/30/12 11:09:12 [31322] grid_monitor log file for head.beng02.com:2119 updated. 04/30/12 11:09:12 [31322] grid_monitor log file for head.beng02.com:2119 looks normal 04/30/12 11:09:12 [31322] in doContactSchedd() 04/30/12 11:09:12 [31322] querying for removed/held jobs 04/30/12 11:09:12 [31322] Using constraint ((Owner=?="zhrani"&&JobUniverse==9)) && ((Managed =!= "ScheddDone")) && (JobStatus == 3 || JobStatus == 4 || (JobStatus == 5 && Managed =?= "External")) 04/30/12 11:09:12 [31322] Fetched 0 job ads from schedd 04/30/12 11:09:12 [31322] Updating classad values for 1111.0: 04/30/12 11:09:12 [31322] LastRemoteStatusUpdate = 1335780552 04/30/12 11:09:12 [31322] leaving doContactSchedd() 04/30/12 11:09:25 [31322] Received CHECK_LEASES signal 04/30/12 11:09:25 [31322] in doContactSchedd() 04/30/12 11:09:25 [31322] querying for renewed leases 04/30/12 11:09:25 [31322] querying for removed/held jobs 04/30/12 11:09:25 [31322] Using constraint ((Owner=?="zhrani"&&JobUniverse==9)) && ((Managed =!= "ScheddDone")) && (JobStatus == 3 || JobStatus == 4 || (JobStatus == 5 && Managed =?= "External")) 04/30/12 11:09:25 [31322] Fetched 0 job ads from schedd 04/30/12 11:09:25 [31322] leaving doContactSchedd() 04/30/12 11:09:28 [31322] GAHP[31326] <- 'RESULTS' 04/30/12 11:09:28 [31322] GAHP[31326] -> 'S' '0' 04/30/12 11:09:30 [31322] Evaluating staleness of remote job statuses. 04/30/12 11:09:42 [31322] grid_monitor for head.beng02.com:2119 entering CheckMonitor 04/30/12 11:10:12 [31322] grid_monitor for head.beng02.com:2119 entering CheckMonitor 04/30/12 11:10:12 [31322] grid_monitor job status for head.beng02.com:2119 file has been refreshed. 04/30/12 11:10:12 [31322] Read full grid_monitor status file for head.beng02.com:2119: scan start=1335780526, scan finish=1335780526, job count=1 04/30/12 11:10:12 [31322] Read grid_monitor status file for head.beng02.com:2119 successfully 04/30/12 11:10:12 [31322] grid_monitor log file for head.beng02.com:2119 updated. 04/30/12 11:10:12 [31322] grid_monitor log file for head.beng02.com:2119 looks normal 04/30/12 11:10:12 [31322] in doContactSchedd() 04/30/12 11:10:12 [31322] querying for removed/held jobs 04/30/12 11:10:12 [31322] Using constraint ((Owner=?="zhrani"&&JobUniverse==9)) && ((Managed =!= "ScheddDone")) && (JobStatus == 3 || JobStatus == 4 || (JobStatus == 5 && Managed =?= "External")) 04/30/12 11:10:12 [31322] Fetched 0 job ads from schedd 04/30/12 11:10:12 [31322] Updating classad values for 1111.0: 04/30/12 11:10:12 [31322] LastRemoteStatusUpdate = 1335780612 04/30/12 11:10:12 [31322] leaving doContactSchedd() 04/30/12 11:10:25 [31322] Received CHECK_LEASES signal 04/30/12 11:10:25 [31322] in doContactSchedd() 04/30/12 11:10:25 [31322] querying for renewed leases 04/30/12 11:10:25 [31322] querying for removed/held jobs 04/30/12 11:10:25 [31322] Using constraint ((Owner=?="zhrani"&&JobUniverse==9)) && ((Managed =!= "ScheddDone")) && (JobStatus == 3 || JobStatus == 4 || (JobStatus == 5 && Managed =?= "External")) 04/30/12 11:10:25 [31322] Fetched 0 job ads from schedd 04/30/12 11:10:25 [31322] leaving doContactSchedd() Remote Host Log: TIME: Mon Apr 30 11:07:27 2012 PID: 13401 -- Notice: 6: globus-gatekeeper pid=13401 starting at Mon Apr 30 11:07:27 2012 TIME: Mon Apr 30 11:07:27 2012 PID: 13401 -- Notice: 6: Got connection 10.71.88.93 at Mon Apr 30 11:07:27 2012 TIME: Mon Apr 30 11:07:27 2012 PID: 13401 -- Notice: 5: Authenticated globus user: /O=Grid/OU=GlobusTest/OU=simpleCA-head.beng02.com/OU=beng02.com/CN=zahrani TIME: Mon Apr 30 11:07:27 2012 PID: 13401 -- Notice: 0: GRID_SECURITY_HTTP_BODY_FD=6 TIME: Mon Apr 30 11:07:27 2012 PID: 13401 -- Notice: 5: Requested service: jobmanager TIME: Mon Apr 30 11:07:27 2012 PID: 13401 -- Notice: 5: Authorized as local user: zhrani TIME: Mon Apr 30 11:07:27 2012 PID: 13401 -- Notice: 5: Authorized as local uid: 516 TIME: Mon Apr 30 11:07:27 2012 PID: 13401 -- Notice: 5: and local gid: 516 TIME: Mon Apr 30 11:07:27 2012 PID: 13401 -- Notice: 0: executing /usr/local/globus-4.2.0/libexec/globus-job-manager TIME: Mon Apr 30 11:07:27 2012 PID: 13401 -- Notice: 0: GRID_SECURITY_CONTEXT_FD=9 TIME: Mon Apr 30 11:07:27 2012 PID: 13401 -- Notice: 0: Child 13402 started TIME: Mon Apr 30 11:07:27 2012 PID: 13403 -- Notice: 6: globus-gatekeeper pid=13403 starting at Mon Apr 30 11:07:27 2012 TIME: Mon Apr 30 11:07:27 2012 PID: 13403 -- Notice: 6: Got connection 10.71.88.93 at Mon Apr 30 11:07:27 2012 TIME: Mon Apr 30 11:07:27 2012 PID: 13403 -- Notice: 5: Authenticated globus user: /O=Grid/OU=GlobusTest/OU=simpleCA-head.beng02.com/OU=beng02.com/CN=zahrani TIME: Mon Apr 30 11:07:27 2012 PID: 13403 -- Notice: 0: GRID_SECURITY_HTTP_BODY_FD=6 TIME: Mon Apr 30 11:07:27 2012 PID: 13403 -- Notice: 5: Requested service: jobmanager-pbs TIME: Mon Apr 30 11:07:27 2012 PID: 13403 -- Notice: 5: Authorized as local user: zhrani TIME: Mon Apr 30 11:07:27 2012 PID: 13403 -- Notice: 5: Authorized as local uid: 516 TIME: Mon Apr 30 11:07:27 2012 PID: 13403 -- Notice: 5: and local gid: 516 TIME: Mon Apr 30 11:07:27 2012 PID: 13403 -- Notice: 0: executing /usr/local/globus-4.2.0/libexec/globus-job-manager TIME: Mon Apr 30 11:07:27 2012 PID: 13403 -- Notice: 0: GRID_SECURITY_CONTEXT_FD=9 TIME: Mon Apr 30 11:07:27 2012 PID: 13403 -- Notice: 0: Child 13404 started TIME: Mon Apr 30 11:07:36 2012 PID: 13433 -- Notice: 6: globus-gatekeeper pid=13433 starting at Mon Apr 30 11:07:36 2012 TIME: Mon Apr 30 11:07:36 2012 PID: 13433 -- Notice: 6: Got connection 10.71.88.93 at Mon Apr 30 11:07:36 2012 TIME: Mon Apr 30 11:07:36 2012 PID: 13433 -- Notice: 5: Authenticated globus user: /O=Grid/OU=GlobusTest/OU=simpleCA-head.beng02.com/OU=beng02.com/CN=zahrani TIME: Mon Apr 30 11:07:36 2012 PID: 13433 -- Notice: 0: GRID_SECURITY_HTTP_BODY_FD=6 TIME: Mon Apr 30 11:07:36 2012 PID: 13433 -- Notice: 5: Requested service: jobmanager-fork TIME: Mon Apr 30 11:07:36 2012 PID: 13433 -- Notice: 5: Authorized as local user: zhrani TIME: Mon Apr 30 11:07:36 2012 PID: 13433 -- Notice: 5: Authorized as local uid: 516 TIME: Mon Apr 30 11:07:36 2012 PID: 13433 -- Notice: 5: and local gid: 516 TIME: Mon Apr 30 11:07:36 2012 PID: 13433 -- Notice: 0: executing /usr/local/globus-4.2.0/libexec/globus-job-manager TIME: Mon Apr 30 11:07:36 2012 PID: 13433 -- Notice: 0: GRID_SECURITY_CONTEXT_FD=9 TIME: Mon Apr 30 11:07:36 2012 PID: 13433 -- Notice: 0: Child 13434 started Regards, |