Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] CM Failover with submits from CM
- Date: Tue, 14 Jul 2009 10:39:48 -0500
- From: Matthew Farrellee <matt@xxxxxxxxxx>
- Subject: Re: [Condor-users] CM Failover with submits from CM
Dan Bradley wrote:
>
> Janzen Brewer wrote:
>> Dan Bradley wrote:
>>
>>> Condor supports fail-over of the submit node.
>>>
>>>
>> I understand that the submit node can be failed over, but I'm curious as
>> to what happens to the output of a completed job if the submit node from
>> which it was submitted failed during its execution. Does the execute
>> node keep the output until the secondary submit node undergoes failback?
>> Or does it attempt to write it to the same directory on the secondary
>> submit node?
>>
>
> I don't know much about schedd failover.
>
> I think the directories where output is to be stored would all need to
> be on a shared disk accessible to both submit nodes. Jobs that are
> running when the primary submit node fails will wait for up to the job
> lease duration (default 20 minutes) for the secondary submit node to
> take over. When the job finishes, whether if finishes during that time
> or after that time, the output would get copied back to the functioning
> submit node onto the shared disk.
>
> Of course, if you do all this only to make the shared filesystem into a
> single point of failure, you've probably only made things slightly worse.
>
> --Dan
That's accurate.
The SPOOL and all state is shared between the HA Schedds.
Best,
matt