Subject: [Condor-users] Job executes, but status of job cannot be changed
I am trying to understand why our jobs
do not exit the queue after successful completion. It seems to be related
to that the ad owner and the submit owner are different. Can anyone
shed some light on why this is occurring. Both accounts have write permission.
If I run the job with the account igskbacb-condoradmin,
the job exists. The igskbacb-condoradmin
is a service account that the daemons run under. Our Pool is Windows XP
only and we are using Condor 7.4.2.
Thanks,
Mike
Schedlog
05/12 07:51:02 (pid:2348) ad owner:
odonnellm, queue submit owner: igskbacb-condoradmin
05/12 07:51:02 (pid:2348) OwnerCheck(igskbacb-condoradmin)
failed in SetAttribute for job 291.0
05/12 07:51:02 (pid:2348) condor_write(fd=1232
<IP:1901>,,size=37,timeout=20,flags=0)
05/12 07:51:02 (pid:2348) condor_read(fd=1232
<IP:1901>,,size=21,timeout=20,flags=0)
05/12 07:51:02 (pid:2348) condor_read():
fd=1232
05/12 07:51:02 (pid:2348) condor_read():
select returned 1
05/12 07:51:02 (pid:2348) condor_read(fd=1232
<IP:1901>,,size=71,timeout=20,flags=0)
05/12 07:51:02 (pid:2348) condor_read():
fd=1232
05/12 07:51:02 (pid:2348) condor_read():
select returned 1
05/12 07:51:02 (pid:2348) PERMISSION
GRANTED to igskbacb-condoradmin@gs from host 159.189.162.39 for queue management,
access level WRITE: reason: cached result for WRITE; see first case for
the full reason
Shadowlog (submit machine):
05/10 15:35:35 (248.0) (2088): condor_write(fd=1672
schedd at <IP:4905>,,size=595,timeout=300,flags=0)
05/10 15:35:35 (248.0) (2088): SECMAN:
resume, other side is $CondorVersion: 7.4.0 Oct 31 2009 BuildID: 193173
$, NOT reauthenticating.
05/10 15:35:35 (248.0) (2088): SECMAN:
about to enable message authenticator.
05/10 15:35:35 (248.0) (2088): SECMAN:
successfully enabled message authenticator!
05/10 15:35:35 (248.0) (2088): SECMAN:
about to enable encryption.
05/10 15:35:35 (248.0) (2088): SECMAN:
successfully enabled encryption!
05/10 15:35:35 (248.0) (2088): SECMAN:
startCommand succeeded.
05/10 15:35:35 (248.0) (2088): Authorizing
server '*/IP'.
05/10 15:35:35 (248.0) (2088): condor_write(fd=1672
schedd at <IP:4905>,,size=76,timeout=300,flags=0)
05/10 15:35:35 (248.0) (2088): condor_read(fd=1672
schedd at <IP:4905>,,size=21,timeout=300,flags=0)
05/10 15:35:35 (248.0) (2088): condor_read():
fd=1672
05/10 15:35:35 (248.0) (2088): condor_read():
select returned 1
05/10 15:35:35 (248.0) (2088): condor_read(fd=1672
schedd at <IP:4905>,,size=16,timeout=300,flags=0)
05/10 15:35:35 (248.0) (2088): condor_read():
fd=1672
05/10 15:35:35 (248.0) (2088): condor_read():
select returned 1
05/10 15:35:35 (248.0) (2088): updateExprTree:
Failed SetAttribute(NumJobStarts, 1)
05/10 15:35:35 (248.0) (2088): condor_write(fd=1672
schedd at <IP:4905>,,size=92,timeout=300,flags=0)
... Removed 5 additional tries
05/10 15:35:35 (248.0) (2088): Failed
to perform final update to job queue!
05/10 15:35:35 (248.0) (2088): Maximum
number of job cleanup retry attempts (SHADOW_MAX_JOB_CLEANUP_RETRIES=5)
reached; Forcing job requeue!
05/10 15:35:35 (248.0) (2088): KEYCACHEENTRY:
deleted: 00D18648
05/10 15:35:35 (248.0) (2088): KEYCACHEENTRY:
deleted: 00D33CD0
05/10 15:35:35 (248.0) (2088): KEYCACHEENTRY:
deleted: 00D217A8
05/10 15:35:35 (248.0) (2088): KEYCACHE:
deleted: 00B7B6F0
05/10 15:35:35 (248.0) (2088): CLOSE
<IP:2880> fd=1716
05/10 15:35:35 (248.0) (2088): CLOSE
<127.0.0.1:2881> fd=1248
05/10 15:35:35 (248.0) (2088): CLOSE
<127.0.0.1:2882> fd=1732
05/10 15:35:35 (248.0) (2088): ****
condor_shadow (condor_SHADOW) pid 2088 EXITING WITH STATUS 107