Hi,
I'm trying to set up Condor as an alternative interface
to our PBS cluster.
This is my setup so far:
- I've installed Condor (BoSCO) on our PBS
login/submit node.
- I've enabled MASTER, COLLECTOR, NEGOTIATOR, and
SCHEDD.
- I've set GLITE_LOCATION and PBS_GAHP in
condor_config.
- I've set pbs_binpath and pbs_spoolpath in
GLITE_LOCATION/etc/batch_gahp.config.
With this setup, I can submit jobs to our PBS cluster
using `condor_submit`. But for some reason, Condor won't
be able to find/track the submitted jobs. While the actual
PBS jobs will keep running (and eventually terminate), the
corresponding Condor "meta jobs" will remain IDLE for a
few minutes and finally change their status to HELD.
Do you have an idea what might cause this behavior or
how to debug it?
Cheers,
Lukas
User LOG:
027 (001.000.000) 03/02 18:47:52 Job submitted to grid
resource
GridResource: batch pbs
GridJobId: batch pbs
acsrvcl02.gi.rwth-aachen.de_9618_acsrvcl02.gi.rwth-aachen.de#1.0#1583171263
pbs/20200302/10044
...
012 (001.000.000) 03/02 18:53:01 Job was held.
Error parsing classad or job not found
Code 0 Subcode 0
GrindmanagerLog.lukask (D_FULLDEBUG):
03/02/20 18:50:43 [2578688] Received CHECK_LEASES
signal
03/02/20 18:50:43 [2578688] in doContactSchedd()
03/02/20 18:50:43 [2578688] querying for renewed
leases
03/02/20 18:50:43 [2578688] querying for removed/held
jobs
03/02/20 18:50:43 [2578688] Using constraint
((Owner=?="lukask"&&JobUniverse==9)) &&
((Managed =!= "ScheddDone")) && (JobStatus == 3 ||
JobStatus == 4 || (JobStatus == 5 && Managed =?=
"External"))
03/02/20 18:50:43 [2578688] Fetched 0 job ads from
schedd
03/02/20 18:50:43 [2578688] leaving doContactSchedd()
03/02/20 18:50:45 [2578688] Evaluating periodic job
policy expressions.
03/02/20 18:50:46 [2578688] GAHP[2578692] <-
'RESULTS'
03/02/20 18:50:46 [2578688] GAHP[2578692] -> 'S'
'0'
03/02/20 18:50:48 [2578688] Evaluating staleness of
remote job statuses.
03/02/20 18:50:58 [2578688] (1.0) doEvaluateState
called: gmState GM_SUBMITTED, remoteState 0
03/02/20 18:50:58 [2578688] (1.0) gm state change:
GM_SUBMITTED -> GM_POLL_ACTIVE
03/02/20 18:50:58 [2578688] GAHP[2578692] <-
'BLAH_JOB_STATUS 5 pbs/20200302/10044'
03/02/20 18:50:58 [2578688] GAHP[2578692] -> 'S'
03/02/20 18:50:59 [2578688] GAHP[2578692] <-
'RESULTS'
03/02/20 18:50:59 [2578688] GAHP[2578692] -> 'R'
03/02/20 18:50:59 [2578688] GAHP[2578692] -> 'S'
'1'
03/02/20 18:50:59 [2578688] GAHP[2578692] -> '5'
'1' 'Error parsing classad or job not found' '0' 'N/A'
03/02/20 18:50:59 [2578688] (1.0) doEvaluateState
called: gmState GM_POLL_ACTIVE, remoteState 0
03/02/20 18:50:59 [2578688] (1.0) gm state change:
GM_POLL_ACTIVE -> GM_SUBMITTED