Hi Alexandre,
perhaps the issue arises due to this line:
universe = vanilla
ALICE jobs have:
universe = grid
They are submitted to a local HTCondor cluster, which then takes
care of submitting the job to the CE that was indicated in the JDL:
grid_resource = condor ce.some.domain ce.some.domain:9619
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Alexandre Boyer via HTCondor-users <htcondor-users@xxxxxxxxxxx>
Sent: Friday, March 7, 2025 11:45 AM To: htcondor-users@xxxxxxxxxxx <htcondor-users@xxxxxxxxxxx> Cc: Alexandre Franck Boyer <alexandre.franck.boyer@xxxxxxx> Subject: [HTCondor-users] X509 error: "unsupported version" when submitting jobs with a token and a proxy: v9 to v24 Dear HTCondor experts,
I hope you are doing well! Context: ======== I have the following submission script: ``` # Environment # ----------- universe = vanilla # Inputs/Outputs # -------------- # Inputs: executable to submit executable = /tmp/script.sh # Directory that will contain the outputs initialdir = /tmp/initial_dir # Outputs: stdout, stderr, log output = $(Cluster).$(Process).out error = $(Cluster).$(Process).err log = $(Cluster).$(Process).log # No other files are to be transferred transfer_output_files = "" # Transfer outputs, even if the job is failed should_transfer_files = YES when_to_transfer_output = ON_EXIT_OR_EVICT # Environment variables to pass to the job environment = "PILOT_STAMP=$(stamp) HTCONDOR_JOBID=$(Cluster).$(Process)" # Credentials # ----------- use_x509userproxy = true use_scitokens = true scitokens_file = /tmp/token.token # Requirements # ------------ request_cpus = 1 # Exit options # ------------ # Specify the signal sent to the job when HTCondor needs to vacate the worker node kill_sig=SIGTERM # By default, HTCondor marked jobs as completed regardless of its status # This option allows to mark jobs as Held if they don't finish successfully =!= 0 # A subcode of our choice to identify who put the job on hold > # Jobs are then deleted from the system after N days if they are not idle or running periodic_remove = (JobStatus != 1) && (JobStatus != 2) && ((time() - EnteredCurrentStatus) > (1 * 24 * 3600)) Queue stamp in d962af4e2da4895439e94e5c01a1a305 ``` I am submitting this JDL to various HTCondor instances with the following environment variables: ``` X509_USER_PROXY=/tmp/tmpcir64bnk _CONDOR_SEC_CLIENT_AUTHENTICATION_METHODS="SCITOKENS" _CONDOR_SCITOKENS_FILE=/tmp/token.token ... ``` The authentication is done through SCITOKENS but I still need to include a proxy. Problem: ======== I have been using condor v9.0 for years, everything has been fine. I recently decided to upgrade to condor v24 but started to get the following error when submitting jobs: ``` $ condor_submit -terse -pool <htcondor instance>:9619 -remote <htcondor instance> -debug 03/05/25 16:30:54 Delegation error: 4068D7FDB37F0000:error:05800091:x509 certificate routines:X509_REQ_verify_ex:unsupported version:crypto/x509/x_all.c:47: 03/05/25 16:30:54 Delegation error: 03/05/25 16:30:54 ReliSock::put_x509_delegation(): delegation failed: X509Credential::Delegate() failed 03/05/25 16:30:54 Transfer exit info: Success = False | Error[13.115] = '|Error: sending file /tmp/tmpcir64bnk' | Ack = DOWNLOAD | Line = 5482 | Files = 0 | Retry = True 03/05/25 16:30:54 DoUpload: SUBMIT at 188.185.73.26 failed to send file(s) to <htcondor_instance:9619>: |Error: sending file /tmp/tmpcir64bnk; SCHEDD at <htcondor_instance> - |Error: receiving file /var/lib/condor-ce/spool/4429/0/cluster2214429.proc0.subproc0.tmp/tmpcir64bnk DCSchedd::spoolJobFiles:7002:File transfer failed for target job 2214429.0: SUBMIT at <address> failed to send file(s) to <htcondorinstance:9619>: |Error: sending file /tmp/tmpcir64bnk; SCHEDD at 131.169.223.136 - |Error: receiving file /var/lib/condor-ce/spool/4429/0/cluster2214429.proc0.subproc0.tmp/tmpcir64bnk ERROR: Failed to spool job files. ``` Have you ever seen that? In the changelog (https://htcondor.org/htcondor/release-highlights/), I see an entry that seems related in the 9.2.0 release: ``` Fix problem where proxy delegation to older HTCondor versions failed ``` The error is triggered by the "use_x509userproxy" option. Attempts to solve the issue: ============================ - Removing "use_x509userproxy" option from the JDL: - "fixes" the issue, but sites still need to get the token along with a proxy so I can't just drop it. - or at least they need to get VOMS attributes from it, so may be adding the VOMS attributes to the JDL is a possibility but this needs to be discussed. - Replacing "use_x509userproxy" option with "x509userproxy = "/tmp/tmpcir64bnk" - leads to the same issue - Checking the validity of the used proxy: - `voms-proxy-info -all -file <proxy>` gives me the details of the proxy correctly - The version of the proxy seems fine `Version: 3 (0x2)` - But the error does not seem to reference the version of the proxy but the one of a CSR: https://github.com/openssl/openssl/blob/master/crypto/x509/x_all.c#L43C1-L49C6 Thanks a lot for your support! Should you need any further details, please let me know. Best regards, Alexandre Boyer _______________________________________________ HTCondor-users mailing list To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a subject: Unsubscribe Join us in June at Throughput Computing 25: https://osg-htc.org/htc25 The archives can be found at: https://www-auth.cs.wisc.edu/lists/htcondor-users/ |