Re: [HTCondor-devel] Getting Classads from multiple child collectors


Date: Tue, 8 Jun 2021 14:55:27 -0500
From: Todd Tannenbaum <tannenba@xxxxxxxxxxx>
Subject: Re: [HTCondor-devel] Getting Classads from multiple child collectors
On 6/8/2021 2:36 PM, Matthew Ens via HTCondor-devel wrote:

Hello,


I am trying to configure my htcondor (version 8.8) to run with multiple child collectors by following the instructions from the wiki

Almost everything works well, my worker machines connect to my central server and my jobs run.

If I run a condor_status I can see my worker machines but I would like to use the python API to query the main collector for the Master ClassAds of my worker machines and I cannot see them. I need to query the socket of the specific child collector to see the ClassAd of the worker.

I was under the impression that setting CONDOR_VIEW_HOST=127.0.0.1 would forward all the ClassAds from the child collectors to the main collector but does this include the python API? Is there some configuration I am missing?

Thanks for the help,

Matt



Hi Matt,

First off, be aware of config knob CONDOR_VIEW_CLASSAD_TYPES.  I cut-n-pasted the entry on this at the bottom of this email.  You may want to customize this knob on your central manager to be something like:
   
    CONDOR_VIEW_CLASSAD_TYPES=Machine,Submitter,Master

If after changing the above (and doing a condor_reconfig), if you are still having problems:  On the same machine where you are trying to run your python code, does "condor_status -master" work to display the master classads? If doing that condor_status command does not work either, what does the output of "condor_config_val COLLECTOR_HOST" say -- does it point to your 'top level' collector?   I just want to discover if your trouble is the python bindings, or is the master ads themselves are not in the top level collector.

Note that going to the trouble of setting up tree of child collectors is typically only needed for very large pools ( i.e. several tens of thousands of slots) where the worker nodes regularly join and leave the pool (aka pilot or glidein pools)... 


From the Manual:

CONDOR_VIEW_CLASSAD_TYPES

Provides the ClassAd types that will be forwarded to the CONDOR_VIEW_HOST. The ClassAd types can be found with condor_status -any. The default forwarding behavior of the condor_collector is equivalent to

CONDOR_VIEW_CLASSAD_TYPES=Machine,Submitter



--
Todd Tannenbaum <tannenba@xxxxxxxxxxx>  University of Wisconsin-Madison
Center for High Throughput Computing    Department of Computer Sciences
Calendar: https://tinyurl.com/yd55mtgd  1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132                   Madison, WI 53706-1685 
[← Prev in Thread] Current Thread [Next in Thread→]