[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] condor_adstash unable to fetch startd histrories - Failed to authenticate with any method



Hi Alec,

First a clarification just so we're on the same page, the ads that adstash is getting from the startds are job history ads, they are not startd machine ads. Adstash is doing the equivalent of "condor_history -startd". If you want to stash machine ads (which you can get from the collector rather than going to the startd directly), you will need to use (/develop) a different tool. (condor_status does have a -json flag that could be helpful here...)

When adstash gets job history ads from a startd, it is indeed accessing the startd directly. The reason adstash (and all Python/CLI tools generally) first talks to the collector is to get the current address of the startd, after which it then connects to the startd directly using the returned address.

Whether or not adstash is going to be able to read the job history from startds inside glideins will depend on both the READ-level authentication settings of the startds (as you've already found out) and the network situation of the worker node where the glidein is running. I'm not actually sure what the default authentication settings are for startds running under glideinWMS glideins, but setting up IDTOKEN authentication is problematic because that would involve shipping a private key with all of the glideins. Your best bet would be to have READ access open to the world (ALLOW_READ = *) if your project can accept that. Regardless, if a glidein's WN is behind a NAT, you're probably not going to be able to connect to the startd anyway. Fetching history ads from EPs running in glideins is not really something we have explored ourselves yet, mostly because we don't expect every site to have their worker nodes on public interfaces.

Hopefully this provides some guidance and clarification

Jason

On Fri, Dec 13, 2024 at 11:20âAM Alec Sheperd <alec.sheperd@xxxxxxxxxxxxxxxx> wrote:
Hi,

I've been trying to test out using condor_adstash to dump startd metrics to our elasticsearch, but have been getting authentication errors when the script tries to query the startds themselves. Currently using condor 10.0.9.

multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
 File "/usr/lib64/python3.6/multiprocessing/pool.py", line 119, in worker
  result = (True, func(*args, **kwds))
 File "/usr/libexec/condor/adstash/adstash.py", line 198, in startd_history_processor
  ads = src.fetch_ads(startd_ad, max_ads=args.startd_history_max_ads)
 File "/usr/libexec/condor/adstash/ad_sources/startd_history.py", line 42, in fetch_ads
  return startd.history(requirements=True, projection=[], **history_kwargs)
 File "/usr/lib64/python3.6/site-packages/htcondor/_lock.py", line 70, in wrapper
  rv = func(*args, **kwargs)
htcondor.HTCondorIOError: Failed to authenticate with any method

Our pool primarily runs as glideins on grid resources/flocking to other pools. It seems like what is happening is that when the collector is queried for the startd ads, that is fine as the collector can access those daemons ads. But trying to directly access the startds themselves isn't able to authenticate. Is there a mechanism or way to query the startd ads directly from the collector in condor_adstash? Or is there perhaps some other configuration method I can set to access the startds directly? Our primary authentication method is IDTOKEN, so perhaps there is a way to configure that properly for adstash?

Alec
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe

The archives can be found at: https://www-auth.cs.wisc.edu/lists/htcondor-users/