Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Matching to not responding machines
- Date: Wed, 28 Mar 2012 13:13:39 +0200
- From: Hermann Fuchs <hermann.fuchs@xxxxxxxxxxxxxxxx>
- Subject: Re: [Condor-users] Matching to not responding machines
Thank you for this idea.
I'll have a look into it.
Regards,
Hermann
On Wed, 2012-03-28 at 12:14 +0200, Rob de Graaf wrote:
> Hi Hermann,
>
> On 03/28/2012 11:32 AM, Hermann Fuchs wrote:
> > However, I would like to implement some kind of a failure detection for
> > the running grid as network problems will and do occur.
> > Is there a query which is only answered when the machines do
> > communicate?
> > condor_status seems to be misleading, the machines listed there which
> > stopped communicating remain there in some cases (e.g. the mentioned
> > case).
>
> You could use INVALIDATE_STARTD_ADS (man condor_advertise) to make the
> collector forget about specific machines. You would need to know which
> machines to invalidate. The only way I can think of right now is to ask
> them directly (condor_status -direct or maybe condor_config_val) and
> check the exit status of those commands. The downside of this approach
> is that you will have to endure a timeout for every machine that has the
> problem. If you have hundreds or thousands of machines, it will quickly
> become unfeasible.
>
> Alternatively, you could tweak CLASSAD_LIFETIME on the collector to make
> it forget about unresponsive machines more quickly, but it might also
> accidentally invalidate working machines if any updates get lost on the
> network. See:
> http://research.cs.wisc.edu/condor/manual/v7.6/3_3Configuration.html#SECTION004316000000000000000
>
> Regards,
>
> Rob
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
>
--
-------------
DI Hermann Fuchs
Christian Doppler Laboratory for Medical Radiation Research for Radiation Oncology
Department of Radiation Oncology
Medical University Vienna
Währinger Gürtel 18-20
A-1090 Wien
Tel. + 43 / 1 / 40 400 7271
Mail. hermann.fuchs@xxxxxxxxxxxxxxxx