Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] Problems with power outage etc
- Date: Thu, 6 Jun 2013 08:45:11 +0000 (UTC)
- From: Romain <nuelromain@xxxxxxxxx>
- Subject: Re: [HTCondor-users] Problems with power outage etc
Peter Ellevseth <Peter.Ellevseth@...> writes:
>
> Hello all
>
> We have had a few incidents with power outages etc. What then happens is
that our jobs are usually restarted.
> This is not something we generally want. Our jobs usually run for weeks
and we would rather have the job exit
> than restarting as all result files are usually overriden in such an
event. What is the best approach to
> avvoid this?
>
> This morning we also had a problem when a domain controller went down for
a while and the starter wasn't able
> to see the schedd even though they were both alive. At some point then the
lease expired and the job
> restarted. We want to avoid this aswell.
>
> >From my standpoint it would be better if the jobs would just keep running
even though the schedd is out of
> reach. Our cluster is sufficiently small that if a couple errant jobs keep
on running we can fix that manually.
>
> Regards Peter
>
>
Sorry but what is the link with my post? I think your post isn't in the
right place