On Thu, Jan 08, 2004 at 06:26:04PM +0000, Alexander Klyubin wrote:
Hello!
I'm experiencing strange behavior of my Java jobs running on Condor. The
jobs run in a different pool via flocking. When a job completes within
several minutes everything is fine. When the job runs longer it
completes, but no files get transferred back, despite the fact that
Condor thinks everything went fine.
okay.
The submit machine runs Condor 6.5.5, whereas the execute machine and
its central manager run Condor 6.6.0. All machines run Linux.
Can this strange behavior be caused by the fact that execute machine's
local time is one hour ahead of the submit machine one?
most likely you've hit the nail right on the head. after the job finishes,
it connects back to the submit side to transfer files. it tries to reuse
the same session it had when it was spawned. in 6.5.5 the default session
duration was 1 hour. in 6.6.0 it was set to much longer, 100 days. due to
your clock skew, the session probably expired on one side or the other before
the job finished.
to work around, you can add this to your condor_config:
SEC_DEFAULT_SESSION_DURATION = 8640000
this change should actually be made for all users of 6.5.X, especially if
you have long-running jobs. it is not needed in your 6.6.X config files
but it will cause no harm either.
1/7 11:24:46 (2765.0) (5383): DC_AUTHENTICATE: attempt to open invalid
session klyubin:5383:1073470798:0, failing.
1/7 11:24:46 (2765.0) (5383): **** condor_shadow (condor_SHADOW) EXITING
WITH STATUS 100
if you are curious, this is the line that clued me in.
cheers,
-zach
Condor Support Information:
http://www.cs.wisc.edu/condor/condor-support/
To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
unsubscribe condor-users <your_email_address>