[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] refreshGSIProxy stopped working



The important versions are the OpenSSL version of the sender and the HTCondor version of the receiver.
The receiver creates a Certificate Signing Request (CSR) that the sender signs with the proxyâs key. Older HTCondor versions set the parameters of the CSR in a way that newer OpenSSL versions refuse to sign. So Stefano is correct that this bug shouldnât trigger if the submitter has an older OpenSSL.

If your submitter has an older OpenSSL, that suggests there may be another bug that we need to investigate once we fix the bugs you encountered in the v2 bindings.

 - Jaime

On Jan 14, 2026, at 11:09âAM, Krittin Phornsiricharoenphant <krittin.phornsiricharoenphant@xxxxxxx> wrote:

Thank you so much Jaime. I had read the threads on OpenSSL but likely got confused.

Stefano thought that we were unaffected since the submitter has an old OpenSSL.
Our AP's now have OpenSSL 3.5.1 

We went for the second solution as an immediate fix. Easier and safer until we rebuild our server image with newer htcondor
and fully test/validate. Thanks again for timely help.

Krittin

On 14 Jan 2026, at 17:21, Jaime Frey <jfrey@xxxxxxxxxxx> wrote:

The initial failure was mostly likely triggered by an update of OpenSSL to version 3.4.0 or later. Details on that can be found here: https://opensciencegrid.atlassian.net/browse/HTCONDOR-2904

We included a fix in the following versions:
  23.0.24
  23.10.24
  24.0.10
  24.7.3
All versions 25.0 and beyond include the fix.

We can reproduce some of the failures youâre seeing in the v2 python bindings and are investigating.

In the mean time, here are two possible workarounds:
* Downgrade the HTCondor version used by the CRAB server and revert to the v1 bindings. They can talk to newer HTCondor APs.

* Set DELEGATE_JOB_GSI_CREDENTIALS=False in the HTCondor configuration of your CRAB server. This will do a regular file copy of the proxy (encrypted) when sending to the AP, instead of a delegation.


On Jan 14, 2026, at 8:42âAM, Stefano Belforte via HTCondor-users <htcondor-users@xxxxxxxxxxx> wrote:

Let me add:

back in December we updated our AP's from 24.0.6 to 25.0.3
because all of a sudden all remote submissions were failing
with an error in initial proxy delegation [1] when an AP was rebooted.
Most likely some system library involved in delegation was updated.
We could not figure out details, but Marco Mascheroni (rightly)
suggested me to reproduce on latest HTC before asking for help.
Since v25.0.3 fixed the submission problem, we updated all AP's,
but at the time did not check the proxy renewal :-(

So rolling back is not an option. I tried on an empty test AP
and got the same error as [1]

Stefano

[1] details in https://urldefense.com/v3/__https://github.com/dmwm/CRABServer/issues/9245__;!!Mak6IKo!Ih12bou7tAYO__Ew67elckFxYysd2q3jlv1Md_LNNHouEkQIID7A5P4SmoMQ8TkyK1VxsJgH90ZtuKsRS0TbW-3vFUBxUg$
On 14/01/2026 15:07, Stefano Belforte wrote:
Dear HTCondor developers,

we have a critical problem with CRAB. We just notice that all calls to
python API refreshGSIProxy() which were of course working "before" now fail
raising htcondor.HTCondorIOError.

Due to holidays we only noticed today.

The call is made in docker image build with HTCondor 24.7.3 using v1 API, which
had been working  fine for us for years.

We do not have logs going back more than a month on that server, but the error
is there at least since Dec 12 and we suspect that it is due to updating the AP's
from 24.7 to 25.0.3 which we did at around that date.

Does this sound a possibility to you ?

We tried to use v2 API in the code which calls refreshGSIProxy(), but

1. there appears to be bugs in the code [1]
2. after fixing those the call raises an htcondor2_impl.HTCondorException with no further details

This means that our server is not able to refresh proxies on AP's. The initial proxy is valid for 7 days and during holidays most (all?) tasks completed in less than that so
the problem was not noticed. But clearly we can not operate like this.

What suggestions do you have to get out of this situation ?
IIUC we can not roll back HTC version on the AP's live, like we do for updates,
we'd need to fully drain the scheduler and re-install. Fully draining takes 1 month
usually, but 1 week now if we can't renew X509 proxy. Yet, we would rather avoid !

Please.. help !

Stefano

[1] https://urldefense.com/v3/__https://github.com/htcondor/htcondor/blob/3ef80065b75a4a70a09e274c37cc1d27e5fb1e50/bindings/python/htcondor2/_schedd.py*L914-L926__;Iw!!Mak6IKo!Ih12bou7tAYO__Ew67elckFxYysd2q3jlv1Md_LNNHouEkQIID7A5P4SmoMQ8TkyK1VxsJgH90ZtuKsRS0TbW-0SrHs8Pg$  the initial self is missing in the argument list in line 914
and in line 926  int(proxy) should be int(proc)





_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe

The archives can be found at: https://www-auth.cs.wisc.edu/lists/htcondor-users/