so, if a user submit a job with request_memory=1000
and I have a policy where if the user goes over 1000 megabytes the
job gets held; my process is at 900 megabytes and I run something
stupid and goes over 100 megabytes the job gets held? Is that the
correct behavior?
On Wed, Jul 27, 2011 at 11:33 AM, Dan
Bradley <dan@xxxxxxxxxxxx>
wrote:
For better or worse, all processes run by the user on the
execute node are run and monitored as though they were
part of the job. Therefore, policies relating to cpu
affinity, cpu usage, memory usage, and so on should all be
applied to the ssh session, just like any other processes
run by the user in a job.
--Dan
On 7/27/11 5:36 AM, Rita wrote:
Can people take advantage of
condor_ssh_to_jobs ?
Can´t they login to the box and run something
else which will take additional resources? Or is
there mechanism which will prevent that?
On Tue, Jul 19, 2011 at
4:47 PM, Sassy Natan <sassyn@xxxxxxxxx>
wrote:
Hi Dan,
Well, I tried to play a little we the
screen command, and found out there are
many issues while using it.
I guess it is not a good solution as
you pointed out.
I still think however, that it is
important feature to have a control on
your running process. Not only killing,
holding etc... but the ability to stole it
to your own console.
I know I can have all the files and
logs the process created, I know also I
can sniff them on the execute machine. I
do not know however, how can I get
interact will the running process control
by condor.
I will send a new email asking if it
maybe possible to change the directory
where the process is running?
So when condor run the job on execute
machine, the running dir will not be
created in the /var/lib/condor/execute
directory (the default in RedHat RPMs) but
in let's say the home user directory?
For Example:
If user foo.bar is submitting a job,
the running dir of the job will be located
in a NFS shared volume, accessible to all
execute machine instead on a limit local
disk size located @ /var/lib/condor.
Thanks
Sassy
On Tue, Jul
19, 2011 at 1:01 AM, Dan Bradley <dan@xxxxxxxxxxxx>
wrote:
Sassy,
condor_ssh_to_job gives you an
interactive shell on the same
machine and in the same
environment as the running job.
This allows you to inspect the
process in many ways, but it
does not attach the i/o streams
of the job to your terminal as
though you had run the job by
hand. The i/o streams of the
job are directed to files or
streams and cannot be easily
redirected to something else
unless you go through some extra
effort when running the job in
order to make this possible. I
haven't tried it myself, but I
imagine it would be possible to
use the unix 'screen' utility to
make this possible. However, I
would recommend getting more
familiar with standard batch-job
debugging techniques before
trying something exotic like
running every job under screen.
Being able to type commands into
running jobs is a nifty thought,
but batch jobs should be
designed to run without
interactive input, so it doesn't
sound that useful in practice to
me.
--Dan
On 7/18/11 4:44 PM, Sassy
Natan wrote:
While goggling the different results
for condor_ssh_to_job, I
have found
some interesting example
on this page https://twiki.grid.iu.edu/bin/view/Engagement/HtmlVersion
(see 9
Appendix: Monitoring a
running job). In the
example it shows about
two interesting commands: glidein_ls
and glidein_interactive.
This is very cool, but
as far as I know by a
quick reading it is part
of the glideinWMS project. Is there
anything like this in
condor? I guess I could
look in the command
files (which are python
based) to understand how
this is working in glideinWMS
and maybe try to
convert them. But I
guess if someway have
a different ideas,
please be my guest :-)
I have the feeling
condor already have
this, I just don't
know how yet :-)
Sassy
On Mon, Jul 18,
2011 at 8:10 PM,
Sassy Natan <sassyn@xxxxxxxxx>
wrote:
Hi,
I'm
running condor
on Linux, with
total of 200
slots in my
pool.
When running a
job, my users
would like
from time to
time to
interact with
the running
job.
So if for
example they
look in the
job output
file (stdout)
and see some
error, they
would like to
ssh the job
and do some
changes for
the future
input files
(in
the execute dir).
I manage
to do ssh for
the job, and
even get a
welcome screen
that point me
to the slot
the job is
running.
I also
getting the
PID of the
process, but I
don't know how
to bind to the
process.
If my
process in the
job.sub is a
perl script,
getting different args
and also
calling
to different tools
(like matlab,
gcc etc...),
how can I get
into a mode
that looks
like I run the
command from
my console?
where I can
see the stdout
tail on
screen, and I
can do CTRL+C
to terminate
the job? same
as I do when
using
non-condor
env?
The
things is that
if one of the
tool get
error, it get
into a it's
own shell,
like for
example in
matlab, where
I can provide
or change
some parameters and
resume the
run. However
in a condor
mode, this
just get into
the shell and
I can not bind
to it. The job
is running
from a
condor perspective,
but as a
matter of fact
it's just in a
idle
mode, waiting for
some input on
the shell (In
my case
matlab, but
there are some
other tools as
well).
I tried
to use gdb,
but that seems
to stuck my
job. The
minute I did
that, the job
log file seems
to hang out.
Until I did
that it did
printed a lot
of info (I use
the stream
option). But
once I used
the gdb there
was no more
activity on
the running
machine.
I know
the job is
getting into a
shell mode,
since there
are some
error. If
there is no
error the job
complete suspensefuly,
but my users
really like to
debug the job
if it get into
this mode and
not having to
run from
the beginning or
outside
condor.
Can
someone please
provide an
example? or a
feedback?