[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] refreshGSIProxy stopped working



Dear HTCondor developers,

we have a critical problem with CRAB. We just notice that all calls to
python API refreshGSIProxy() which were of course working "before" now fail
raising htcondor.HTCondorIOError.

Due to holidays we only noticed today.

The call is made in docker image build with HTCondor 24.7.3 using v1 API, which
had been working fine for us for years.

We do not have logs going back more than a month on that server, but the error is there at least since Dec 12 and we suspect that it is due to updating the AP's
from 24.7 to 25.0.3 which we did at around that date.

Does this sound a possibility to you ?

We tried to use v2 API in the code which calls refreshGSIProxy(), but

1. there appears to be bugs in the code [1]
2. after fixing those the call raises an htcondor2_impl.HTCondorException with no further details

This means that our server is not able to refresh proxies on AP's. The initial proxy is valid for 7 days and during holidays most (all?) tasks completed in less than that so
the problem was not noticed. But clearly we can not operate like this.

What suggestions do you have to get out of this situation ?
IIUC we can not roll back HTC version on the AP's live, like we do for updates, we'd need to fully drain the scheduler and re-install. Fully draining takes 1 month usually, but 1 week now if we can't renew X509 proxy. Yet, we would rather avoid !

Please.. help !

Stefano

[1] https://github.com/htcondor/htcondor/blob/3ef80065b75a4a70a09e274c37cc1d27e5fb1e50/bindings/python/htcondor2/_schedd.py#L914-L926 the initial self is missing in the argument list in line 914
and in line 926Â int(proxy) should be int(proc)