[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] X509 error: "unsupported version" when submitting jobs with a token and a proxy: v9 to v24



Hi Alexandre,
perhaps the issue arises due to this line:

universe = vanilla

ALICE jobs have:

universe = grid

They are submitted to a local HTCondor cluster, which then takes
care of submitting the job to the CE that was indicated in the JDL:

grid_resource = condor ce.some.domain ce.some.domain:9619



From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Alexandre Boyer via HTCondor-users <htcondor-users@xxxxxxxxxxx>
Sent: Friday, March 7, 2025 11:45 AM
To: htcondor-users@xxxxxxxxxxx <htcondor-users@xxxxxxxxxxx>
Cc: Alexandre Franck Boyer <alexandre.franck.boyer@xxxxxxx>
Subject: [HTCondor-users] X509 error: "unsupported version" when submitting jobs with a token and a proxy: v9 to v24
 
Dear HTCondor experts,

I hope you are doing well!

Context:
========

I have the following submission script:

```
# Environment
# -----------
universe = vanilla

# Inputs/Outputs
# --------------
# Inputs: executable to submit
executable = /tmp/script.sh

# Directory that will contain the outputs
initialdir = /tmp/initial_dir

# Outputs: stdout, stderr, log
output = $(Cluster).$(Process).out
error = $(Cluster).$(Process).err
log = $(Cluster).$(Process).log

# No other files are to be transferred
transfer_output_files = ""

# Transfer outputs, even if the job is failed
should_transfer_files = YES
when_to_transfer_output = ON_EXIT_OR_EVICT

# Environment variables to pass to the job
environment = "PILOT_STAMP=$(stamp) HTCONDOR_JOBID=$(Cluster).$(Process)"

# Credentials
# -----------
use_x509userproxy = true
use_scitokens = true
scitokens_file = /tmp/token.token

# Requirements
# ------------
request_cpus = 1

# Exit options
# ------------
# Specify the signal sent to the job when HTCondor needs to vacate the
worker node
kill_sig=SIGTERM
# By default, HTCondor marked jobs as completed regardless of its status
# This option allows to mark jobs as Held if they don't finish successfully
=!= 0
# A subcode of our choice to identify who put the job on hold
> # Jobs are then deleted from the system after N days if they are not
idle or running
periodic_remove = (JobStatus != 1) && (JobStatus != 2) && ((time() -
EnteredCurrentStatus) > (1 * 24 * 3600))

Queue stamp in d962af4e2da4895439e94e5c01a1a305
```

I am submitting this JDL to various HTCondor instances with the
following environment variables:

```
X509_USER_PROXY=/tmp/tmpcir64bnk
_CONDOR_SEC_CLIENT_AUTHENTICATION_METHODS="SCITOKENS"
_CONDOR_SCITOKENS_FILE=/tmp/token.token
...
```

The authentication is done through SCITOKENS but I still need to include
a proxy.

Problem:
========

I have been using condor v9.0 for years, everything has been fine.
I recently decided to upgrade to condor v24 but started to get the
following error when submitting jobs:

```
$ condor_submit -terse -pool <htcondor instance>:9619 -remote <htcondor
instance> -debug
03/05/25 16:30:54 Delegation error: 4068D7FDB37F0000:error:05800091:x509
certificate routines:X509_REQ_verify_ex:unsupported
version:crypto/x509/x_all.c:47:

03/05/25 16:30:54 Delegation error:
03/05/25 16:30:54 ReliSock::put_x509_delegation(): delegation failed:
X509Credential::Delegate() failed
03/05/25 16:30:54 Transfer exit info: Success = False | Error[13.115] =
'|Error: sending file /tmp/tmpcir64bnk' | Ack = DOWNLOAD | Line = 5482 |
Files = 0 | Retry = True
03/05/25 16:30:54 DoUpload: SUBMIT at 188.185.73.26 failed to send
file(s) to <htcondor_instance:9619>: |Error: sending file
/tmp/tmpcir64bnk; SCHEDD at <htcondor_instance> - |Error: receiving file
/var/lib/condor-ce/spool/4429/0/cluster2214429.proc0.subproc0.tmp/tmpcir64bnk

DCSchedd::spoolJobFiles:7002:File transfer failed for target job
2214429.0: SUBMIT at <address> failed to send file(s) to
<htcondorinstance:9619>: |Error: sending file /tmp/tmpcir64bnk; SCHEDD
at 131.169.223.136 - |Error: receiving file
/var/lib/condor-ce/spool/4429/0/cluster2214429.proc0.subproc0.tmp/tmpcir64bnk

ERROR: Failed to spool job files.
```

Have you ever seen that?
In the changelog (https://htcondor.org/htcondor/release-highlights/), I
see an entry that seems related in the 9.2.0 release:

```
Fix problem where proxy delegation to older HTCondor versions failed
```

The error is triggered by the "use_x509userproxy" option.

Attempts to solve the issue:
============================

- Removing "use_x509userproxy" option from the JDL:
   - "fixes" the issue, but sites still need to get the token along with
a proxy so I can't just drop it.
   - or at least they need to get VOMS attributes from it, so may be
adding the VOMS attributes to the JDL is a possibility but this needs to
be discussed.

- Replacing "use_x509userproxy" option with "x509userproxy =
"/tmp/tmpcir64bnk"
   - leads to the same issue

- Checking the validity of the used proxy:
   - `voms-proxy-info -all -file <proxy>` gives me the details of the
proxy correctly
   - The version of the proxy seems fine `Version: 3 (0x2)`
   - But the error does not seem to reference the version of the proxy
but the one of a CSR:
https://github.com/openssl/openssl/blob/master/crypto/x509/x_all.c#L43C1-L49C6


Thanks a lot for your support!
Should you need any further details, please let me know.

Best regards,
Alexandre Boyer

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe

Join us in June at Throughput Computing 25: https://osg-htc.org/htc25

The archives can be found at: https://www-auth.cs.wisc.edu/lists/htcondor-users/