Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] Issues with checkpointing
- Date: Fri, 19 Oct 2007 10:43:59 -0400 (EDT)
- From: "Brian Dandurand" <bdandur@xxxxxxxxxxx>
- Subject: [Condor-users] Issues with checkpointing
This is again related to the problem with jobs not checkpointing when
evicted. If anyone has any insight, I would appreciate it.
The executable is weiweicase10. I get the following message when I run the
program on a local station from a terminal:
Condor: Notice: Will checkpoint to weiweicase10.ckpt
Condor: Notice: Remote system calls disabled.
...
<program runs a while>
<I press CONTROL-Z to suspend the job>
^ZKilled
unixlab03%
--------------------
and its killed. I'm wondering if the job is supposed to be suspended
rather than be killed in order to be able to checkpoint. This executable
was compiled from a fortran 90 program.
In that case, is there something we are supposed to do to make the
executable suspendable?
Where would the checkpoints be created, and which directory?
----------------------------------------
Brian C. Dandurand
Clemson University
Department of Mathematical Sciences
Ph.D. Student
Office: Martin Hall E-6
Office Phone: (864)656-4749
----------------------------------------