Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] VMs being cleaned up/removed
- Date: Tue, 26 Apr 2005 11:06:54 -0400 (EDT)
- From: Leslie Groer <groer@xxxxxxxxxxxxxxxxxxx>
- Subject: Re: [Condor-users] VMs being cleaned up/removed
Hi Alain
Thanks for the fast response. Yes, 6.7.3 or 6.7.6 sounds like it might
have the feature fix we need. I am reluctant to go the TCP update route
as currently this is a test configuration - soon we will have 220 nodes
behind the CM so I guess that is getting towards a large pool.
Leslie
On Tue, 26 Apr 2005, Alain Roy wrote:
>
> >I am running Condor version 6.7.2 on Scientific Linux 3.0.3
> >with 11 dual-cpu worker nodes with 4 VMs each. There are three schedulers
> >and the CM is using kerberos authentication.
> >
> >I notice that fairly often, VMs will be "cleaned up" during housecleaning
>
> Try a newer version of Condor.
>
> Condor 6.7.3 has:
>
> >This release contains all the bug fixes from the 6.6 stable series upto
> >and including version 6.6.7, and some of the fixes that will be included
> >in version 6.6.8. The bug fixes in version 6.6.8 that were not included in
> >version 6.7.3 are listed in a seperate section of the 6.6.8 version history.
>
>
> Condor 6.6.8 has:
> >Fixed issues that would cause condor_ startd to ``disappear'' from the
> >pool because of dropped machine ad updates. This fix applies to all
> >platforms, but the symptoms were exhibited predominantly on Windows machines.
>
> And this is one of the bug fixes included in 6.7.3.
>
> So there is a decent shot that this problem will be fixed by upgrading to
> Condor 6.7.6, which is the most recent Condor release in the 6.7.x series.
>
> The condor_startd advertises each virtual machine by sending a UDP update
> to the collector. In some busy networks, these updates can be lost. If
> upgrading doesn't work for you, you can tell Condor to use TCP instead. We
> don't use this as a default in order to avoid having hundred of
> simultaneous open TCP connections on large pools, but it's certainly
> reasonable for your small pool. You can learn how to configure this in the
> manual:
>
> http://www.cs.wisc.edu/condor/manual/v6.7/3_11Setting_Up.html#sec:tcp-collector-update
>
> Basically, you do "UPDATE_COLLECTOR_WITH_TCP = TRUE" in your config file.
>
> I hope this helps. If it doesn't, please do let us know. It's not a feature
> that machines disappear from your pool!
>
> -alain
>
>
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
--
,-~~-.___. ________________________________________________
/ | ' \ groer@xxxxxxxxxxxxxxxxxxx Department of Physics
( ) 0 Tel: +1-416-978-2959 University of Toronto
\_/-, ,----' Fax: +1-416-978-8221 60 St. George Street
==== // Toronto, ON M5S 1A7
/ \-'~; /~~~(O) Canada
/ __/~| / | Office: McLennan Physics Lab Room 911
=( _____| (_________| http://home.fnal.gov/~groer
Leslie S. Groer