Hi,
I just found something stange, maybe this is the reason the the jobs
being held. The version of condor used is 6.7.8:
[dietz@hydra Run1]$ condor_version
$CondorVersion: 6.7.8 Jun 9 2005 $
$CondorPlatform: I386-LINUX_RH9 $
But in the ShadowLog it says something about version 6.7.12:
12/28 12:28:57 (601662.0) (9946):Read: User Job - $CondorPlatform:
I386-LINUX_RH9 $
12/28 12:28:57 (601662.0) (9946):Read: User Job - $CondorVersion:
6.7.12 Sep 24 2005 $
12/28 12:28:57 (601662.0) (9946):ERROR: User job is NOT compatible with
this shadow version
Maybe on the nodes is another version installed as on the headnode or
something like this?
Paul: Can you check this please?
Regards
Alex
dietz@xxxxxxxxxxxx wrote:
Hi,
the issue with the unknown reason for holding condor jobs is still not
resolved. I just checked the execuatbles and they are condor_compiled. And I still do
not know why they get hold.
Regards
Alex
On Wed, 21 Dec 2005 14:46:40 -0600, Alexander Dietz wrote
> Erik Paulson wrote:
>
On Wed, Dec 21, 2005 at 02:31:37PM -0600, Alexander Dietz
wrote:
Hi,
in the ShadowLog it
says:
12/21 14:18:42 (601662.0) (8918):ERROR: User job is NOT compatible with
this shadow
version
What does this mean? I ran very similar jobs on the same cluster some
zillion times before, and in moste cases it worked out. Any
ideas?
Are you running a standard universe
job?
yes its the standard universe
>
Did you use Condor 6.7
for the condor_compile step, but submit from a machine running
Condor
6.6? Condor can't do that, because the older 6.6 shadow may not
know
how to handle the system calls a 6.7 job would
make.
> According to 'condor_version' its version 6.7.8
>
-Erik
Alex
Matt Hope
wrote:
you look in the SchedLog for entries about 475473.0? The
schedd will
log
when it puts jobs on hold, even if it doesn't update the
job.
where are the
SchedLog's?
On your submit machine
run
condor_config_val
LOG
This will output the path to the daemon logs. Look in SchedLog and
go
with what Erik said by looking for any mention relating to those
jobs.
You may also wish to look in the ShadowLog just in
case.
An exit status of 112 from the shadow indicates that the schedd
should
put the job on hold (which it is doing) so there might be something
in
there.
Also supply the submit script text just incase there are any
periodic
expressions that might indicate
it
Matt
_______________________________________________
Condor-users mailing
list
Condor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
_______________________________________________
Condor-users mailing
list
Condor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
_______________________________________________
Condor-users mailing
list
Condor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
--
Open WebMail Project (http://openwebmail.org)
_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
|