Hi,
I'm running condor
on Linux, with total of
200 slots in my pool.
When running a job,
my users would like from
time to time to interact
with the running job.
So if for example
they look in the job
output file (stdout) and
see some error, they
would like to ssh the
job and do some changes
for the future input
files (in
the execute dir).
I manage to do ssh
for the job, and even
get a welcome screen
that point me to the
slot the job is running.
I also getting the
PID of the process, but
I don't know how to bind
to the process.
If my process in the
job.sub is a perl
script,
getting different args
and also calling
to different tools (like
matlab, gcc etc...), how
can I get into a mode
that looks like I run
the command from my
console? where I can see
the stdout tail on
screen, and I can do
CTRL+C to terminate the
job? same as I do when
using non-condor env?
The things is that if
one of the tool get
error, it get into a
it's own shell, like for
example in matlab, where
I can provide or change
some parameters and
resume the run. However
in a condor mode, this
just get into the shell
and I can not bind to
it. The job is running
from a
condor perspective, but
as a matter of fact it's
just in a idle
mode, waiting for some
input on the shell (In
my case matlab, but
there are some other
tools as well).
I tried to use gdb,
but that seems to stuck
my job. The minute I did
that, the job log file
seems to hang out. Until
I did that it did
printed a lot of info (I
use the stream option).
But once I used the gdb
there was no more
activity on the running
machine.
I know the job is
getting into a shell
mode, since there are
some error. If there is
no error the job
complete suspensefuly,
but my users really like
to debug the job if it
get into this mode and
not having to run from
the beginning or outside
condor.
Can someone please
provide an example? or a
feedback?
Thanks
Sassu