Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] How to migrate running jobs with local checkpoints?
- Date: Thu, 11 Sep 2008 11:59:33 +0200
- From: Carsten Aulbert <carsten.aulbert@xxxxxxxxxx>
- Subject: [Condor-users] How to migrate running jobs with local checkpoints?
Hi all,
I have the following problem:
All 4 slots of a machine are currently used by users (all standard
universe jobs). However, the hard disk on the system reported that it
might fail very soon. Thus I would like to migrate the jobs to another
machine and don't lose their 20h+ run-times.
But since local checkpointing is in effect, I don't know who to proceed.
It is possible to just issue
condor_off -startd -peaceful n0066
and then somehow copy the checkpoint file over to another node? How
would condor recognize this and use this particular node for the jobs?
Sorry if this is a dumb question.
Cheers
Carsten
--
Dr. Carsten Aulbert - Max Planck Institute for Gravitational Physics
Callinstrasse 38, 30167 Hannover, Germany
Phone/Fax: +49 511 762-17185 / -17193
http://www.top500.org/system/9234 | http://www.top500.org/connfam/6/list/31