Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] condor_drain & condor_defrag daemon interfering

Date: Tue, 05 Mar 2019 09:33:46 +0100
From: Peter Wienemann <wienemann@xxxxxxxxxxxxxxxxxx>
Subject: Re: [HTCondor-users] condor_drain & condor_defrag daemon interfering

Dear Christoph,

we do not use condor_drain to drain nodes but manually set nodes
unhealthy. According to [0] you also seem to be using some automatic
node health-checking. One of the checks performed by our health-checking
script is to check whether the node is manually set unhealthy.

Technically we use the health-checking feature built in the Puppet
module available on [1] (though with a customised health-checking
script) and then set individual hosts or complete host groups unhealthy
in Foreman.

Maybe this approach is a way for you to avoid the described interference.

Peter

[0] https://lists.cs.wisc.edu/archive/htcondor-users/2016-May/msg00058.shtml

[1] https://github.com/HEP-Puppet/htcondor

On 01.03.19 15:34, Beyer, Christoph wrote:
> Hi,
> 
> we do not use the defrag daemon at the moment but it seems as it would be a desirable thing as the pool is very well used and multicore jobs are kind of hard to get through. 
> 
> What we do already is using the condor_drain command wrapped inside a custom tool/script in order to drain workernodes that are scheduled for maintenance, reinstallation etc. 
> 
> When I tested the defrag daemon it looked to me as if the classadds condor_drain is using to tag the hosts are the same ones the daemon is using which leads to an unwanted behaviour as admin-scheduled tasks interfere with regular draining actions of the defrag daemon. 
> 
> Hence I am looking for an elegant solution to use the condor_drain command for administration purpose and at the same time have the defrag daemon do his job in the background (for ex. always drain 5 nodes at a time down to 8 free cores) without noticing the nodes that are currently administrated by hand. 
> 
> Maybe I got this all wrong or maybe someone has a nifty solution for it ? 
> 
> Best
> Christoph

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

References:
- [HTCondor-users] condor_drain & condor_defrag daemon interfering
  - From: Beyer, Christoph

Prev by Date: Re: [HTCondor-users] transfer_in/output_files only if they exist
Next by Date: Re: [HTCondor-users] Singularity fails after upgrading from 2.6 to 3.1 with dirs not mountable to scratch
Previous by thread: [HTCondor-users] condor_drain & condor_defrag daemon interfering
Next by thread: [HTCondor-users] v8.8/9 Late materialization documentation typo (and bug/error)?
Index(es):
- Date
- Thread

Mailing List Archives

Authenticated access

Re: [HTCondor-users] condor_drain & condor_defrag daemon interfering