Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Issues with checkpointing
- Date: Fri, 19 Oct 2007 10:38:34 -0500
- From: Daniel Forrest <forrest@xxxxxxxxxxxxx>
- Subject: Re: [Condor-users] Issues with checkpointing
Brian,
> This is again related to the problem with jobs not checkpointing when
> evicted. If anyone has any insight, I would appreciate it.
>
> The executable is weiweicase10. I get the following message when I
> run the program on a local station from a terminal:
>
> Condor: Notice: Will checkpoint to weiweicase10.ckpt
> Condor: Notice: Remote system calls disabled.
> ...
> <program runs a while>
> <I press CONTROL-Z to suspend the job>
>
> ^ZKilled
> unixlab03%
> --------------------
> and its killed. I'm wondering if the job is supposed to be suspended
> rather than be killed in order to be able to checkpoint. This executable
> was compiled from a fortran 90 program.
>
> In that case, is there something we are supposed to do to make the
> executable suspendable?
>
> Where would the checkpoints be created, and which directory?
Checkpoints should be created in the current directory.
Try running it like this:
weiweicase10 -_condor_D_ALL [any other args]
In order to get some debugging output.
Operating system and condor version may be helpful too.
--
Daniel K. Forrest Laboratory for Molecular and
forrest@xxxxxxxxxxxxx Computational Genomics
(608) 262 - 9479 University of Wisconsin, Madison