Hi, I have submitted some 80 jobs with condor-g to globus WS-gram which uses condor as local job manager. I am using Revisions: >>$CondorVersion: 6.7.19 May 10 2006 $ $CondorPlatform: I386-LINUX_RH9 $ >>globus WS-GRAM 4.0.3 one of the 80 jobs somehow produced: >>4/21 12:42:00 [27020] gt4GramCallbackHandler: Can't find record for globus job with contact >>https://160.103.6.172:8443/wsrf/services/ManagedExecutableJobService?8e67fd40-eff4-11db-9cfc-fbda6cfa7cba on globus state StageIn, ignoring as a follow-up this jobs alway stays in condors_q state 'I' and the gridmanager logs: >>4/21 12:47:16 [27020] (3904.0) gmState GM_PROBE_JOBMANAGER, globusState 32: gt4_gram_client_job_status() failed as further follow-up no condor-q job submission is possible anymore. The logging of the gridmanager stops. All newly submitted jobs stay in condor_q 'I' state. After a condor_restart the jobs still stay hanging in 'I' state and the gridmanager logging stops with the last message: >>4/21 13:19:21 [4403] (3904.0) gmState GM_REGISTER, globusState 32: gt4_gram_client_job_callback_register() failed What do I have to do to get the system correctly running again (without restarting globus and condor or even rebooting)? regards.... --
Dr.W-D Klotz - Europ. Synch. Rad. Facility (ESRF) - 6 r Jules Horowitz, BP 220, 38043 Grenoble, FRANCE work: +33(0)4.76.88.29.21 fax:...24.27 mobile: +33(0)6.87.38.59.27 mail: wdklotz@xxxxxxxxx or klotz@xxxxxxx chat: skype Please avoid sending me Word(.doc) or PowerPoint(.ppt) attachments. |